A comprehensive multilingual variant of MS MARCO for Indian languages, featuring select queries and corresponding passages with high-quality translations.
| Code | Language | Load Command | Sample Count |
|---|---|---|---|
as |
Assamese | load_dataset('ai4bharat/IndicMSMARCO', 'as') |
~999 |
bn |
Bengali | load_dataset('ai4bharat/IndicMSMARCO', 'bn') |
~999 |
gu |
Gujarati | load_dataset('ai4bharat/IndicMSMARCO', 'gu') |
~999 |
hi |
Hindi | load_dataset('ai4bharat/IndicMSMARCO', 'hi') |
~999 |
kn |
Kannada | load_dataset('ai4bharat/IndicMSMARCO', 'kn') |
~999 |
ml |
Malayalam | load_dataset('ai4bharat/IndicMSMARCO', 'ml') |
~999 |
mr |
Marathi | load_dataset('ai4bharat/IndicMSMARCO', 'mr') |
~999 |
ne |
Nepali | load_dataset('ai4bharat/IndicMSMARCO', 'ne') |
~999 |
or |
Odia | load_dataset('ai4bharat/IndicMSMARCO', 'or') |
~999 |
pa |
Punjabi | load_dataset('ai4bharat/IndicMSMARCO', 'pa') |
~999 |
ta |
Tamil | load_dataset('ai4bharat/IndicMSMARCO', 'ta') |
~999 |
te |
Telugu | load_dataset('ai4bharat/IndicMSMARCO', 'te') |
~999 |
ur |
Urdu | load_dataset('ai4bharat/IndicMSMARCO', 'ur') |
~999 |
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.