This Automatic Speech Recognition (ASR) model transcribes Assamese speech from 16,000 KHz mono WAV audio files into text.
This ASR (Automatic Speech Recognition) model is designed for Assamese speech recognition. It processes 16,000 KHz mono WAV audio files and converts spoken Assamese into text. The model is based on a Conformer-Large architecture, featuring a 120M-parameter encoder and a hybrid CTC-RNNT decoder. It includes 17 conformer blocks, each with a model dimension of 512, ensuring efficient, accurate, and high-quality transcription of Assamese speech into text for various applications.
MIT
AI4Bharat
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
21/02/25 13:21:50
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.