This Automatic Speech Recognition (ASR) model transcribes Bengali speech from 16,000 KHz mono WAV audio files into text.
This ASR (Automatic Speech Recognition) model is designed for Bengali speech recognition. It processes 16,000 KHz mono WAV audio files and converts spoken Bengali into text. The model is built on a Conformer-Large architecture, featuring a 120M-parameter encoder and a hybrid CTC-RNNT decoder. It includes 17 conformer blocks, each with a model dimension of 512, ensuring efficient, accurate, and high-quality transcription of Bengali speech into text for various applications.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:49
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.