.png)
ParamBench is a graduate-level benchmark dataset for evaluating Large Language Models (LLMs) on India-centric subjects. It contains 17,275 Hindi MCQs across 21 disciplines from competitive exams, enabling assessment of subject knowledge, cultural understanding, and reasoning abilities. Paper: https://arxiv.org/pdf/2508.16185
ParamBench is a large-scale, graduate-level benchmark dataset designed to evaluate the performance of Large Language Models (LLMs) on India-centric subjects and culturally grounded knowledge. The dataset consists of 17,275 multiple-choice questions (MCQs) in Hindi, collected from Indian competitive examination papers and their corresponding answer keys. It spans 21 diverse academic subjects, including Anthropology, Sociology, History, Law, Political Science, Economics, Philosophy, and Indian Culture, providing broad coverage of humanities, social sciences, and domain-specific knowledge. Each data instance represents a single MCQ and includes a question in Hindi, four answer options (A-D), the correct answer label, and metadata such as subject, exam name, and question type. The dataset incorporates multiple question formats Normal MCQ, Assertion-Reason, Match the List, Ordering, Fill in the Blank, and Identify Incorrect Statement- enabling fine-grained evaluation of reasoning and analytical capabilities. All questions are preserved in Hindi, ensuring authentic evaluation of linguistic and cultural understanding without reliance on translation. The dataset is released as a single test split (17,275 samples) and is intended exclusively for evaluation, enabling standardized and reproducible benchmarking of LLMs across subjects and question types. Overall, ParamBench provides a comprehensive and challenging evaluation suite for measuring subject-wise knowledge, cultural awareness, and reasoning ability of modern language models in the Indian context.
The Primary Purpose Of Parambench Is To Provide A Rigorous, Standardized Benchmark For Evaluating Large Language Models (Llms) On India-centric Subjects, Addressing The Lack Of Culturally Grounded And Non-english Evaluation Datasets. It Enables Comprehensive Assessment Of Subject-specific Knowledge Across 21 Academic Disciplines, While Also Measuring Reasoning Abilities Through Diverse Question Formats Such As Assertion-reason, Match The List, Ordering, Fill In The Blank, And Identify Incorrect Statement. By Using Graduate-level Questions From Competitive Exams, The Dataset Evaluates Both Conceptual Understanding And Analytical Thinking. Parambench Also Aims To Assess Cultural And Contextual Understanding By Focusing On Indian Knowledge Domains Such As History, Philosophy, Law, And Culture, Which Are Often Underrepresented In Existing Benchmarks. By Preserving All Data In Hindi, It Ensures Authentic Evaluation Of Multilingual Capabilities Without Reliance On Translation. Additionally, The Dataset Supports Fine-grained Analysis Through Subject-wise And Question-type Performance, Helping Identify Weaknesses In Model Behavior. As A Test-only Benchmark With A Standardized Evaluation Setup, Parambench Enables Reproducible Comparisons And Provides Insights To Guide Future Research In Multilingual, Culturally Aware Ai Systems.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved.