Indian Flag
Government Of India
A-
A
A+
Tamil ASR Benchmark Dataset (Commonvoice Tamil)

Tamil ASR Benchmark Dataset (Commonvoice Tamil)

Tamil ASR (Automatic Speech Recognition) benchmark dataset from Bhashini for supporting the development of robust regional speech recognition systems.

About Dataset

This is a Tamil ASR benchmark dataset specifically designed to support the evaluation and development of Automatic Speech Recognition (ASR) systems in general-use cases. The dataset includes high-quality audio samples representing everyday language and conversations, making it a versatile resource for building and testing ASR models. Submitted by AI4Bharat, it plays a vital role in advancing speech recognition technologies for regional languages, enabling applications in general-purpose scenarios and beyond.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 41
  • Views 588
  • File Size 1005.15 MB

Tags Tags

  • NLP Dataset
  • Benchmark
  • Tamil
  • General Domain
  • Automatic Speech Recognition
  • Speech Technology
  • AI4Bharat
  • ASR
  • Regional Languages
  • Audio Processing

License Control License Control

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

common_voice_ta_19083461.wav ( 151.54 KB )


To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(1005.15 MB)
  • admin·11 month(s) ago
    • audio/wav
      common_voice_ta_19083461.wav
    • audio/wav
      common_voice_ta_19083463.wav
    • audio/wav
      common_voice_ta_19083464.wav
    • audio/wav
      common_voice_ta_19083465.wav
    • audio/wav
      common_voice_ta_19083472.wav
    • audio/wav
      common_voice_ta_19083477.wav
    • audio/wav
      common_voice_ta_19083478.wav
    • audio/wav
      common_voice_ta_19083479.wav
    • audio/wav
      common_voice_ta_19083480.wav
    • audio/wav
      common_voice_ta_19083933.wav
    • more_horiz 5694 more