This is a fine-tuned version of aya-expanse-8b for Part-of-Speech (POS) Tagging on Hinglish (Hindi-English code-mixed) text. It classifies each sentence at the sentence level into the dominant matrix language governing the grammatical structure: hi (Hindi) or en (English).
A LoRA-adapted Transformer LLM fine-tuned for sentence-level Matrix Language Identification (MLI) on Hindi–English (Hinglish) code-mixed text.
hi - Hindi matrix (dominant grammatical structure follows Hindi)en - English matrix (dominant grammatical structure follows English)CohereForAI/aya-expanse-8bAchieves 98.77 F1 (with 98.90 Precision and 98.77 Recall) on the COMI-LINGUA MLI test set (5K instances), setting state-of-the-art (SOTA) among open-weight models, outperforming zero-shot closed LLMs (e.g., gpt-4o ≈ 98.0 F1) and traditional tools.
| Setting | Precision | Recall | F1-score |
|---|---|---|---|
| Zero-shot | 98.71 | 59.56 | 74.25 |
| One-shot | 98.35 | 81.36 | 89.00 |
| Fine-tuned | 98.90 | 98.77 | 94.94 |
Identify the matrix language (hi = Hindi matrix, en = English matrix) in the sentence:
PM Narendra Modi ne Google CEO Sundar Pichai se mulakat ki.
Output: 'hi'
@inproceedings{sheth-etal-2025-comi,
title = "{COMI}-{LINGUA}: Expert Annotated Large-Scale Dataset for Multitask {NLP} in {H}indi-{E}nglish Code-Mixing",
author = "Sheth, Rajvee and
Beniwal, Himanshu and
Singh, Mayank",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.422/",
pages = "7973--7992",
isbn = "979-8-89176-335-7"
}Apache 2.0
Rajvee Sheth, Mayank Singh
Transformers
Transformers
Open
Science, Technology and Research
10/02/26 05:48:06
1.89 GB
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.