Automatic Speech Recognition (ASR) model for Odia speech recognition, processing audio and transcribing spoken content into text.
Automatic Speech Recognition (ASR) model for Odia speech recognition, developed using the Icefall toolkit with the Zipformer architecture. The model is trained on a dataset consisting of approximately 100 hours of labelled speech. It is trained on 16 kHz audio, including naturally occurring code-mixed speech, enabling robust recognition of bilingual Indian speech patterns. The system is based on a 65M-parameter Zipformer-Medium encoder, paired with an RNN-T prediction network and joiner, forming a low-latency streaming ASR model with 16 encoder layers and a 512-dimensional representation.
Attribution 4.0 International (CC BY- 4.0)
SPRING LAB IITM
Speech -to-text Conversion
PyTorch
Open
Science, Technology and Research
12/12/25 07:32:04
253.36 MB
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.