ORGANISATION

Shrutilipi

Mining audios from parallel text and audio pairs

About Dataset

Shrutilipi Overview Shrutilipi is a labelled ASR corpus obtained by mining parallel audio and text pairs at the document scale from All India Radio news bulletins for 12 Indian languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu. The corpus has over 6400 hours of data across all languages. This work is funded by Bhashini, MeitY and Nilekani Philanthropies Usage The datasets library… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/Shrutilipi.