Indian Flag
Government Of India
A-
A
A+
MSMARCO-XI

MSMARCO-XI

MS MARCO dataset translated into various Indic languages

About Dataset

This dataset contains the MS MARCO dataset translated into various Indic languages. The original MS MARCO dataset is a collection of queries, passages, and answers for machine reading comprehension and question answering tasks. Each example includes both the original English content and the translated content, along with translation metadata.

Supported Languages

Language Code Language Name Train File Validation File
as Assamese asmtrain.jsonl asmval.jsonl
bn Bengali bentrain.jsonl benval.jsonl
gu Gujarati gutrain.jsonl guval.jsonl
hi Hindi hintrain.jsonl hinval.jsonl
kn Kannada kantrain.jsonl kanval.jsonl
ml Malayalam maltrain.jsonl malval.jsonl
mr Marathi martrain.jsonl marval.jsonl
ne Nepali neptrain.jsonl nepval.jsonl
or Odia ortrain.jsonl orval.jsonl
pa Punjabi pantrain.jsonl panval.jsonl
sa Sanskrit santrain.jsonl sanval.jsonl
ta Tamil tamtrain.jsonl tamval.jsonl
te Telugu teltrain.jsonl telval.jsonl
ur Urdu urdtrain.jsonl urdval.jsonl

Activity Overview Activity Overview

  • Downloads0
  • Redirect 7
  • Views 31
  • File Size 0

Tags Tags

  • Indic Languages
  • multilingual NLP
  • rag
  • retrieval

License Control License Control

MIT