The Dogri ASR (Automatic Speech Recognition) model transcribes 16kHz mono-channel audio into text. It is based on a Conformer-Large architecture with 120M parameters and a hybrid CTC-RNNT decoder, ensuring accurate and efficient speech-to-text conversion
The Dogri ASR (Automatic Speech Recognition) model is designed to convert spoken Dogri into text from 16kHz mono-channel audio files. It is built on a Conformer-Large architecture with 120 million parameters, providing high accuracy in speech recognition. The model utilizes a hybrid CTC-RNNT decoder to improve transcription quality. With 17 conformer blocks and a model dimension of 512, it effectively captures linguistic and acoustic features, making it ideal for speech-to-text applications in Dogri.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:47
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.