This is a fine-tuned version of aya-expanse-8b for Named Entity Recognition (NER) on Hinglish (Hindi-English code-mixed) text. It helps with token-level entity tagging (PERSON, ORGANISATION, LOCATION, DATE, TIME, GPE, HASHTAG, EMOJI, MENTION, X/Other) in Roman/Devanagari scripts. Achieves 94.90 F1 on COMI-LINGUA test set (5K instances), outperforming the zero-shot inference (59.88 F1).
A LoRA-adapted Transformer LLM fine-tuned for token-level Named Entity Recognition (NER) on Hindi–English (Hinglish) code-mixed text.
PERSON - Names of individualsORGANISATION - Institutions or companiesLOCATION - Non-political physical locationsDATE - Temporal expressions (dates)TIME - Temporal expressions (times)GPE - Geo-Political EntitiesHASHTAG - Words prefixed by ‘#’EMOJI - Emoticons conveying emotionsMENTION - User mentions prefixed by ‘@’X / Other - Non-entity tokens (common words, punctuation, etc.)CohereForAI/aya-expanse-8bAchieves 94.90 F1 on the COMI-LINGUA NER test set (5K instances), establishing strong state-of-the-art performance for Hinglish NER among open-weight models, significantly outperforming zero-shot baselines (59.88 F1) and demonstrating the value of fine-tuning for entity boundary detection in mixed-script, code-mixed social media/news text.
| Setting | Precision | Recall | F1-score |
|---|---|---|---|
| Zero-shot | 54.47 | 68.27 | 59.88 |
| One-shot | 79.73 | 81.44 | 79.18 |
| Fine-tuned | 94.94 | 94.91 | 94.90 |
Identify named entities in the sentence:
लंदन के Madame Tussauds में Deepika Padukone के wax statue का गुरुवार को अनावरण हुआ।
Output:
[
{"लंदन": "GPE"},
{"के": "X"},
{"Madame": "ORGANISATION"},
{"Tussauds": "ORGANISATION"},
{"में": "X"},
{"Deepika": "PERSON"},
{"Padukone": "PERSON"},
{"के": "X"},
{"wax": "X"},
{"statue": "X"},
{"का": "X"},
{"गुरुवार": "DATE"},
{"को": "X"},
{"अनावरण": "X"},
{"हुआ।": "X"}
]
@inproceedings{sheth-etal-2025-comi,
title = "{COMI}-{LINGUA}: Expert Annotated Large-Scale Dataset for Multitask {NLP} in {H}indi-{E}nglish Code-Mixing",
author = "Sheth, Rajvee and
Beniwal, Himanshu and
Singh, Mayank",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.422/",
pages = "7973--7992",
isbn = "979-8-89176-335-7"
}Apache 2.0
Rajvee Sheth, Mayank Singh
Transformers
Transformers
Open
Science, Technology and Research
10/02/26 06:29:27
979.66 MB
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.