Automatic Speech Recognition (ASR) model for Punjabi speech recognition, processing audio and transcribing spoken content into text.
Automatic Speech Recognition (ASR) model for Punjabi speech recognition, developed using the Icefall toolkit with the Zipformer architecture. The model is trained on the Punjabi dataset, which consists of approximately 400 hours of labelled speech. It is trained on 16 kHz audio, including naturally occurring code-mixed speech, enabling robust recognition of bilingual Indian speech patterns. The system is based on a 65M-parameter Zipformer-Medium encoder, paired with an RNN-T prediction network and joiner, forming a low-latency streaming ASR model with 16 encoder layers and a 512-dimensional representation.
Attribution 4.0 International (CC BY- 4.0)
SPRING LAB IITM
Speech -to-text Conversion
PyTorch
Open
Science, Technology and Research
09/01/26 06:31:40
260.42 MB
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.