.png)
IndicParam is a graduate-level benchmark of 13,207 MCQs from UGC-NET exams, covering 11 Indic languages and a Sanskrit–English code-mixed subset. It evaluates LLMs on low-resource languages across native scripts, measuring linguistic understanding and domain knowledge. With diverse question formats, it enables fine-grained analysis and highlights gaps in multilingual and cross-lingual performance. Paper: https://arxiv.org/pdf/2512.00333
Paper: https://arxiv.org/pdf/2512.00333 IndicParam is a large-scale, graduate-level benchmark dataset designed to evaluate the performance of Large Language Models (LLMs) on low-resource and extremely low-resource Indic languages. The dataset consists of 13,207 multiple-choice questions (MCQs) collected from official UGC-NET language examination papers and their corresponding answer keys. These questions span 11 Indic languages—Nepali, Marathi, Gujarati, Odia, Maithili, Konkani, Santali, Bodo, Dogri, Rajasthani, and Sanskrit along with an additional Sanskrit–English code-mixed subset. Each data instance represents a single MCQ and includes: - A question in the native script of the target language - Four answer options (A–D) - The correct answer label - Metadata such as subject, exam name, and question type The dataset covers a wide range of question formats, including: - Standard multiple-choice questions - Assertion–Reason - List Matching - Fill in the Blanks - Identify Incorrect Statement - Ordering IndicParam is specifically structured to evaluate both: - Language Understanding (LU): linguistic knowledge such as grammar, syntax, and semantics - General Knowledge (GK): domain knowledge including literature, history, and cultural context All questions are preserved in their original scripts (Devanagari, Gujarati, Odia, and Ol Chiki), ensuring authentic evaluation of multilingual capabilities without reliance on transliteration. The dataset is released as a single test split (13,207 samples) and is intended exclusively for evaluation purposes, enabling standardized and reproducible benchmarking of LLMs across diverse Indic languages. Overall, IndicParam provides a comprehensive and challenging evaluation suite for measuring multilingual understanding, cross-lingual generalization, and cultural competence in modern language models.
The Primary Purpose Of Indicparam Is To Provide A Rigorous, Standardized Benchmark For Evaluating Large Language Models (Llms) On Low- And Extremely Low-resource Indic Languages, Addressing The Gap Where Models Perform Well On High-resource Languages But Struggle To Generalize. Indicparam Enables Evaluation Of Both Language Understanding (Morphology, Syntax, Semantics, Discourse) And Domain Knowledge (Literature, Culture, History). Through Diverse Mcq Formats Such As Normal Mcqs, Assertion–reason, List Matching, Fill In The Blanks, And Ordering, It Supports Fine-grained Analysis Beyond Simple Question Answering. Covering 11 Indic Languages And A Sanskrit–english Code-mixed Variant, The Dataset Allows Per-language Benchmarking And Comparison Across Scripts And Linguistic Settings, Helping Identify Disparities Between Low- And Extremely Low-resource Languages. Since All Questions Are Presented In Native Scripts, It Evaluates True Multilingual Capability Without Reliance On Transliteration. As A Test-only Benchmark With Deterministic Evaluation And Accuracy-based Metrics, Indicparam Ensures Standardized And Reproducible Comparisons Across Models. It Also Highlights Limitations In Cross-lingual Transfer From High-resource Languages Like English. Overall, Indicparam Aims To Drive The Development Of More Inclusive, Robust, And Culturally Grounded Ai Systems, While Informing Future Multilingual Pretraining, Data Collection, And Evaluation Strategies.
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved.