MILU - Multi-task Indic LLM performance evaluation dataset

MILU is a comprehensive benchmark dataset designed to evaluate the performance of Large Language Models (LLMs) across 11 Indic languages. It spans 8 domains and 41 subjects, covering ~80,000 multiple - choice questions with culturally relevant knowledge from India.

About Dataset

The MILU (Multi-task Indic Language Understanding Benchmark) dataset is a large-scale evaluation dataset intended to assess the performance of multilingual Large Language Models (LLMs) in the context of Indic languages. It covers 11 Indian languages, including Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, and English. The dataset spans 8 diverse domains such as Arts & Humanities, Social Sciences, STEM, and Business, containing questions from 41 different subjects. With approximately 80,000 multiple-choice questions and a validation set of 8,933 samples, MILU provides a rigorous benchmark for evaluating language understanding across diverse linguistic and knowledge domains. It incorporates culturally specific knowledge from Indian regional and state-level examinations, making it an essential dataset for LLM evaluation in the Indian linguistic context. The dataset is open-source and available under a CC-BY-4.0 license.