ORGANISATION

Common Voice

Crowdsourced speech dataset covering multiple languages for ASR research.

About Dataset

Mozilla Common Voice is a large, crowdsourced speech dataset containing voice recordings contributed by volunteers from around the world. It covers dozens of languages and accents, with each recording paired with a validated text transcription. The dataset is designed to be open, inclusive, and representative of diverse speakers and linguistic communities.

Purpose of Dataset

Common Voice Is Widely Used For Training And Evaluating Automatic Speech Recognition (Asr) Systems. Its Multilingual And Accent-diverse Nature Makes It Valuable For Building Inclusive Speech Technologies. Researchers Use It To Improve Speech Recognition Accuracy, Reduce Bias, And Develop Voice-enabled Applications Across Languages And Regions.