Mining audios from parallel text and audio pairs
Shrutilipi Overview Shrutilipi is a labelled ASR corpus obtained by mining parallel audio and text pairs at the document scale from All India Radio news bulletins for 12 Indian languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu. The corpus has over 6400 hours of data across all languages. This work is funded by Bhashini, MeitY and Nilekani Philanthropies Usage The datasets library… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/Shrutilipi.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.