Indian Flag
Government Of India
A-
A
A+
MANGO TTS

MANGO TTS

MANGO is the first large-scale dataset designed for evaluating Text-to-Speech (TTS) systems in Indian languages.

About Dataset

Key Features:

  • 255,150 human ratings of TTS-generated outputs and ground-truth human speech.
  • Covers two major Indian languages: Hindi & Tamil, and English.
  • Based on the MUSHRA (Multiple Stimuli with Hidden Reference and Anchor) test methodology.
  • Ratings are provided on a continuous scale from 0 to 100, with discrete quality categories:
    • 100-80: Excellent
    • 80-60: Good
    • 60-40: Fair
    • 40-20: Poor
    • 20-0: Bad
  • Includes evaluations involving:
    • MUSHRA: with explicitly mentioned high-quality references.
    • MUSHRA-NMR: without explicitly mentioned high-quality references.
    • MUSHRA-DG: with detailed guidelines across fine-grained dimensions
    • MUSHRA-DG-NMR: with detailed guidelines across fine-grained dimensions and without explicitly mentioned high-quality references.

Available Splits

The dataset includes the following splits based on the test type and language.

Split Number of Ratings
Hindi__MUSHRA 56500
Hindi__MUSHRA_DG 10000
Hindi__MUSHRA_DG_NMR 10000
Hindi__MUSHRA_NMR 51000
Tamil__MUSHRA 50000
Tamil__MUSHRA_DG 10000
Tamil__MUSHRA_DG_NMR 10000
Tamil__MUSHRA_NMR 48500
English__MUSHRA 4500
English__MUSHRA_DG_NMR 4650

Activity Overview Activity Overview

  • Downloads0
  • Redirect 24
  • Views 69
  • File Size 0

Tags Tags

  • speech
  • Multilingual
  • Benchmark
  • Evaluation
  • Text to Speech
  • mushra
  • human-evaluation

License Control License Control

Attribution 4.0 International (CC BY- 4.0)