
Hindi speech dataset from phone recordings for ASR
The Gram Vaani Hindi ASR dataset consists of telephone-quality speech recordings in Hindi, covering a wide range of dialects across India. It includes ~1000 hours of unlabelled and 105 hours of labelled data (with transcriptions), collected via the Mobile Vaani platform. Accompanying metadata includes speaker location, dialect, emotion, and audio quality.
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.