This Kannada Automatic Speech Recognition (ASR) model transcribes 16kHz mono-channel audio into text. It utilizes a Conformer-Large architecture with 120M parameters and a hybrid CTC-RNNT decoder for high-accuracy speech recognition.
This Automatic Speech Recognition (ASR) model is designed to transcribe spoken Kannada into text from 16kHz mono-channel audio inputs. It is built on a Conformer-Large architecture, featuring 120 million parameters for robust speech processing. The model employs a hybrid CTC-RNNT decoder, ensuring efficient and accurate transcription. With 17 conformer blocks and a model dimension of 512, it captures linguistic nuances effectively, making it ideal for various speech-to-text applications in Kannada.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:44
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.