ORGANISATION

IndicVoices-R

IndicVoices-R is a large-scale, multilingual, multi-speaker speech dataset for Text-to-Speech (TTS) research in Indian languages.

About Dataset

IndicVoices-R is a comprehensive, multilingual speech dataset designed for Text-to-Speech (TTS) research, covering 22 Indian languages. It includes over 1,700 hours of high-quality, spontaneous speech from more than 10,000 speakers. The dataset is processed with advanced techniques to enhance speech clarity and remove background noise, making it ideal for training TTS models and evaluating speaker generalization. It supports zero-shot, few-shot, and many-shot evaluation metrics for robust TTS model development.