Indian Flag
Government Of India
A-
A
A+
Shrutilipi

Shrutilipi

Mining audios from parallel text and audio pairs

About Dataset

Shrutilipi Overview Shrutilipi is a labelled ASR corpus obtained by mining parallel audio and text pairs at the document scale from All India Radio news bulletins for 12 Indian languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu. The corpus has over 6400 hours of data across all languages. This work is funded by Bhashini, MeitY and Nilekani Philanthropies Usage The datasets library… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/Shrutilipi.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 16
  • Views 116
  • File Size 0

Tags Tags

  • Speech Dataset
  • Indian Language
  • natural language processing (NLP)

License Control License Control

Attribution 4.0 International (CC BY- 4.0)