Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
INDICVOICES is a dataset of natural and spontaneous speech containing a total of 23.7K hours of read (8%), extempore (76%) and conversational (15%) audio from 51K speakers covering 400+ Indian districts and 22 languages. See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/IndicVoices.
To Build Robust Speech Interfaces
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.