Dhwani is India's first end-to-end trained speech Large Language Model (LLM), capable of directly understanding speech without a separate ASR (Automatic Speech Recognition) model, avoiding cascading ASR errors. It supports speech-to-text translation across multiple Indic languages and English.
Dhwani is an end-to-end trained speech LLM designed for Indic speech-to-text and multilingual speech translation. Developed by Krutrim AI Labs, Dhwani is powered by Krutrim-1 LLM, enabling direct speech understanding without the need for ASR models. It features a dual encoder structure, utilizing Whisper's speech encoder for processing speech inputs and BEATs audio encoder for non-speech audio signals. The model employs a Window-Level Query Transformer (Q-Former) as a bridge between audio and text processing. Using Low-Rank Adaptation (LoRA) fine-tuning, Dhwani aligns audio-derived inputs with textual output, ensuring accurate speech recognition and translation. It supports English, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, and Telugu and excels in use cases like multilingual communication, media translation, education, healthcare, customer support, business, and legal applications. Evaluation results show high BLEU scores for English-to-Indic and Indic-to-English translations, demonstrating its efficiency in real-world scenarios.
Krutrim Community License Agreement Version 1.0
Ola Krutrim
Automatic Speech Recognition
N.A.
Open
Sector Agnostic
28/02/25 07:00:47
0
Krutrim Community License Agreement Version 1.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.