Shrutam-2 is a LLM based automatic speech recognition system for 12 major Indian languages. It bridges a Conformer speech encoder with a pretrained LLM decoder through a Mixture-of-Experts (MoE) projection layer, enabling high-quality, prompt-controllable transcription across diverse Indic languages.
Unlike conventional CTC/Attention ASR systems that map audio directly to text tokens, Shrutam-2 reframes speech recognition as a conditional language generation task. A speech encoder produces frame-level audio representations, which are then projected into the LLM's embedding space and fed to a frozen LLM decoder alongside a text prompt.
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
bharatgenai
Automatic Speech Recognition Model
PyTorch
Restricted
Sector Agnostic
18/05/26 11:24:38
8.37 GB
13 files, 1 directories
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved.