.jpg)
The Bharat Parallel Corpus Collection (BPCC) is a large-scale parallel corpus for machine translation across 22 Indian languages, developed by AI4Bharat.
The Bharat Parallel Corpus Collection (BPCC), developed by AI4Bharat at IIT Madras, is a comprehensive dataset aimed at improving machine translation for all 22 scheduled Indian languages. It includes approximately 230 million sentence pairs, combining both mined data from existing corpora and human-curated high-quality datasets. BPCC supports multilingual machine translation models like IndicTrans2 and provides evaluation benchmarks for translation quality across diverse domains.
CC0 1.0 Public Domain
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.