This is a fine-tuned version of aya-expanse-8b for Part-of-Speech (POS) Tagging on Hinglish (Hindi-English code-mixed) text. It assigns a grammatical category to each token using a language-agnostic Universal POS tagset suitable for code-mixed content in Roman and Devanagari scripts.
A LoRA-adapted Transformer LLM fine-tuned for token-level Part-of-Speech (POS) tagging on Hindi–English (Hinglish) code-mixed text.
NOUNPROPNVERBADJADVADPPRONDETCONJPARTPRON_WHPART_NEGNUMX (typos, punctuation, abbreviations, foreign elements)CohereForAI/aya-expanse-8bAchieves 88.61 F1 on the COMI-LINGUA POS test set (5K instances), competitive with or slightly outperforming specialized traditional tools and surpassing strong zero-/one-shot LLM baselines.
| Setting | Precision | Recall | F1 |
|---|---|---|---|
| Zero-shot | 76.92 | 29.50 | 40.55 |
| One-shot | 55.29 | 48.70 | 48.20 |
| Fine-tuned | 88.97 | 88.55 | 88.61 |
Assign Part-of-Speech (POS) tags to each token in the sentence given as:
मीराबाई चानू ने 21 st Commonwealth Games में India के लिए first Gold medal जीता था।
Output:
[
{"मीराबाई": "PROPN"},
{"चानू": "PROPN"},
{"ने": "PART"},
{"21": "NUM"},
{"st": "X"},
{"Commonwealth": "PROPN"},
{"Games": "PROPN"},
{"में": "ADP"},
{"India": "PROPN"},
{"के": "ADP"},
{"लिए": "ADP"},
{"first": "ADJ"},
{"Gold": "NOUN"},
{"medal": "NOUN"},
{"जीता": "VERB"},
{"था": "VERB"},
{"।": "X"}
]
@inproceedings{sheth-etal-2025-comi,
title = "{COMI}-{LINGUA}: Expert Annotated Large-Scale Dataset for Multitask {NLP} in {H}indi-{E}nglish Code-Mixing",
author = "Sheth, Rajvee and
Beniwal, Himanshu and
Singh, Mayank",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.422/",
pages = "7973--7992",
isbn = "979-8-89176-335-7"
}Apache 2.0
Rajvee Sheth, Mayank Singh
Transformers
Transformers
Open
Science, Technology and Research
03/02/26 10:53:16
979.67 MB
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.