This Gujarati Automatic Speech Recognition (ASR) model converts 16kHz mono-channel audio into text. Built on a Conformer-Large architecture with 120M parameters and a hybrid CTC-RNNT decoder, it delivers high-accuracy speech-to-text transcription.
The Automatic Speech Recognition (ASR) model is designed to transcribe spoken Gujarati into text from 16kHz mono-channel audio files. It utilizes a Conformer-Large architecture with 120 million parameters, ensuring robust and efficient speech recognition. The model employs a hybrid CTC-RNNT decoder for enhanced transcription accuracy. With 17 conformer blocks and a model dimension of 512, it effectively captures linguistic and acoustic nuances, making it suitable for various speech-to-text applications in Gujarati.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:46
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.