Indian Flag
Government Of India
A-
A
A+
EkaIndicMTEB

EkaIndicMTEB

Eka-IndicMTEB, is a evaluation dataset comprising Indian Multilingual Medical Terms designed to evaluate embedding models on medical terminology across multiple Indic languages and scripts.

About Dataset

Eka-IndicMTEB, is a evaluation dataset comprising Indian Multilingual Medical Terms designed to evaluate embedding models on medical terminology across multiple Indic languages and scripts. It contains 2,532 doctor-verified queries, capturing the linguistic and domain-specific diversity of the Indian healthcare ecosystem. The dataset includes medical entities spanning symptoms, diagnoses, procedures, medications, and related concepts, enriched with real-world linguistic variations such spelling errors, special characters, abbreviations, and colloquial expressions. The dataset covers multilple languages including English, Hindi, Bengali, Tamil, Telugu, Kannada, Marathi, and Malayalam.

Purpose of Dataset

Eka-indicmteb Addresses A Critical Gap In Multilingual Medical Ai Evaluation By Offering: A Shared Evaluation Framework: Researchers Can Now Benchmark Multilingual Medical Embeddings Against A Standardized, Clinically-validated Dataset Spanning Multiple Indian Languages. Insight Into Model Strengths And Weaknesses: The Benchmark Systematically Reveals How Models Handle India's Linguistic Diversity, Identifying Specific Failure Modes And Success Patterns Across Different Language Families And Medical Domains. Guidance For Model Development: Performance Analysis Across Varied Query Types Provides Actionable Insights For Targeted Model Improvements. This Benchmark Is Invaluable For Researchers Developing Cross-lingual Medical Information Retrieval Systems, And Ai Teams Building Multilingual Clinical Decision Support Tools. Healthcare Organizations Deploying Language-agnostic Medical Chatbots Or Semantic Search Systems Will Find This Dataset Essential For Validating Performance Across India's Diverse Linguistic Landscape. Academic Institutions Working On Low-resource Medical Nlp Can Leverage This Benchmark To Identify Gaps And Measure Progress In Indian Language Healthcare Ai.

Activity Overview Activity Overview

  • Downloads4
  • Redirect 16
  • Views 95
  • File Size 0

Tags Tags

  • Dataset
  • indic
  • medical-embeddings

License Control License Control

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)