The Bodo ASR (Automatic Speech Recognition) model converts 16kHz mono-channel audio into text. Built on a Conformer-Large architecture with 120M parameters and a hybrid CTC-RNNT decoder, it ensures high-accuracy speech-to-text transcription.
The Bodo ASR (Automatic Speech Recognition) model is designed for transcribing spoken Bodo into text from 16kHz mono-channel audio files. It is based on a Conformer-Large architecture with 120 million parameters, leveraging a hybrid CTC-RNNT decoder for enhanced transcription accuracy. With 17 conformer blocks and a model dimension of 512, it effectively captures linguistic and acoustic patterns, making it highly efficient for speech recognition tasks in Bodo. This model is ideal for applications requiring precise and reliable Bodo speech transcription.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:48
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.