Automatic Speech Recognition (ASR) model for Tamil speech recognition, processing audio and transcribing spoken content into text.
Automatic Speech Recognition (ASR) model for Tamil speech recognition, developed using the Icefall toolkit with the Zipformer architecture. The model is trained on a dataset consisting of approximately 370 hours of labelled speech. It is trained on 16 kHz audio, including naturally occurring code-mixed speech, enabling robust recognition of bilingual Indian speech patterns. The system is based on a 65M-parameter Zipformer-Medium encoder, paired with an RNN-T prediction network and joiner, forming a low-latency streaming ASR model with 16 encoder layers and a 512-dimensional representation.
Attribution 4.0 International (CC BY- 4.0)
SPRING LAB IITM
Speech -to-text Conversion
PyTorch
Open
Science, Technology and Research
09/01/26 06:40:29
260.42 MB
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.