This Automatic Speech Recognition (ASR) model transcribes Kashmiri speech from 16,000 KHz mono WAV audio files into text
This Automatic Speech Recognition (ASR) model is designed for Kashmiri speech recognition. It processes 16,000 KHz mono WAV audio files and converts spoken content into text. The model follows a Conformer-Large architecture, featuring a 120M-parameter encoder combined with a hybrid CTC-RNNT decoder. It consists of 17 conformer blocks with a model dimension of 512, ensuring efficient and accurate transcription of Kashmiri speech.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:43
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.