This model takes in mono-channel audio files at a 16,000 Hz sampling rate (WAV format) and outputs the transcribed text of the speech contained in the audio.
This Automatic Speech Recognition (ASR) model is designed for Maithili speech recognition. It processes 16,000 KHz mono WAV audio, utilizing a 120M-parameter Conformer-Large encoder with 17 blocks and 512 dimensions.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:40
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.