ORGANISATION

COMI-LINGUA-MT

This is a fine-tuned version of Llama-3.1-8B-Instruct for Machine Translation (MT) on Hinglish (Hindi-English code-mixed) text. It translates code-mixed input in Roman/Devanagari scripts to three target formats: (i) Standard English, (ii) Romanized Hindi, and (iii) Devanagari Hindi.

About Model

Hindi-English Translation Model

A LoRA-adapted Transformer LLM fine-tuned for Hinglish-to-monolingual translation preserving natural fluency and code-mixing nuances. Translates code-mixed input (Roman + Devanagari scripts) into three target formats: (i) Standard English, (ii) Romanized Hindi, (iii) Devanagari Hindi.

Supported Target Formats

Standard English
Romanized Hindi
Devanagari Hindi

Model Overview

Model type: LoRA-adapted Transformer LLM
Base model: meta-llama/Llama-3.1-8B-Instruct
Total parameters: 8B
Trainable parameters: ~32M
License: Apache 2.0
Languages: Hindi, English (code-mixed input → monolingual output)

Performance

Achieves strong gains across all three target formats on the COMI-LINGUA MT test set (5K instances), setting new benchmarks for Hinglish-to-monolingual translation among open-weight models. Significantly outperforms zero-shot and one-shot prompting of the same base model and several larger/closed-weight LLMs.

Setting	Model	Target Language	BLEU	chrF++
Zero-shot	Llama-3.1-8B-Instruct	Standard English	38.3	67.5
Zero-shot	Llama-3.1-8B-Instruct	Romanized Hindi	15.6	49.2
Zero-shot	Llama-3.1-8B-Instruct	Devanagari Hindi	7.4	13.5
One-shot	Llama-3.1-8B-Instruct	Standard English	45.8	72.4
One-shot	Llama-3.1-8B-Instruct	Romanized Hindi	35.3	67.0
One-shot	Llama-3.1-8B-Instruct	Devanagari Hindi	17.9	53.2
Fine-tuned	Llama-3.1-8B-Instruct	Standard English	56.1	78.7
Fine-tuned	Llama-3.1-8B-Instruct	Romanized Hindi	66.6	85.9
Fine-tuned	Llama-3.1-8B-Instruct	Devanagari Hindi	73.5	86.2

Example Inference

Translate the following Hinglish sentence into Standard English, Romanized Hindi, and Devanagari Hindi:

लंदन के Madame Tussauds में Deepika Padukone के wax statue का गुरुवार को अनावरण हुआ।

Output:

Standard English: Deepika Padukone's wax statue was unveiled at Madame Tussauds in London on Thursday.

Romanized Hindi: London ke Madame Tussauds mein Deepika Padukone ke wax statue ka guruvaar ko anavaran hua.

Devanagari Hindi: लंदन के मैडम तुसाद में दीपिका पादुकोण के वैक्स स्टैच्यू का गुरुवार को अनावरण हुआ।

Citation

@inproceedings{sheth-etal-2025-comi,
  title = "{COMI}-{LINGUA}: Expert Annotated Large-Scale Dataset for Multitask {NLP} in {H}indi-{E}nglish Code-Mixing",
  author = "Sheth, Rajvee and
               Beniwal, Himanshu and
               Singh, Mayank",
  editor = "Christodoulopoulos, Christos and
               Chakraborty, Tanmoy and
               Rose, Carolyn and
               Peng, Violet",
  booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
  month = nov,
  year = "2025",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.findings-emnlp.422/",
  pages = "7973--7992",
  isbn = "979-8-89176-335-7"
}

COMI-LINGUA-MT

Metadata

License

Apache 2.0

Hosted By

Rajvee Sheth, Mayank Singh

Task Type

Transformers

Model Format

Transformers

Visibility

Open

Source Organisation

IITGN

Sector

Science, Technology and Research

Updated Date & Time

10/02/26 11:05:28

Created By

Lingo Research Group

Size

1.89 GB

adapter_config.json ( 916 Bytes )

To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.

Activity Overview

0
6
1.89 GB
141

License Control

Apache 2.0

Version Control

Version 1(1.89 GB)

admin·4 month(s) ago
- adapter_config.json
- adapter_model.safetensors
- chat_template.jinja
- optimizer.pt
- README.md
- rng_state.pth
- scaler.pt
- scheduler.pt
- special_tokens_map.json
- 3 more

More Models from IITGN

COMI-LINGUA-POS

This is a fine-tuned version of aya-expanse-8b for Part-of-Speech (POS) Tagging on Hinglish (Hindi-English code-mixed) text. It assigns a grammatical category to each token using a language-agnostic Universal POS tagset suitable for code-mixed content in Roman and Devanagari scripts.

Hinglish

0
4
979.67 MB
131

Updated 3 month(s) ago

IITGN

View Details

COMI-LINGUA-MT

Code-Mixing

Hinglish

0
6
1.89 GB
142

Updated 3 month(s) ago

IITGN

View Details

COMI-LINGUA-MLI

This is a fine-tuned version of aya-expanse-8b for Part-of-Speech (POS) Tagging on Hinglish (Hindi-English code-mixed) text. It classifies each sentence at the sentence level into the dominant matrix language governing the grammatical structure: hi (Hindi) or en (English).

Hinglish

Code-Mixing

0
3
1.89 GB
128

Updated 3 month(s) ago

IITGN

View Details

COMI-LINGUA-LID

This is a fine-tuned version of aya-expanse-8b for Token-level Language Identification (LID) on Hinglish (Hindi-English code-mixed) text. It performs token-wise classification into three categories: en (English), hi (Hindi), or ot (Other).

Code-Mixing

Hinglish

0
14
1.89 GB
178

Updated 3 month(s) ago

IITGN

View Details

COMI-LINGUA-NER

This is a fine-tuned version of aya-expanse-8b for Named Entity Recognition (NER) on Hinglish (Hindi-English code-mixed) text. It helps with token-level entity tagging (PERSON, ORGANISATION, LOCATION, DATE, TIME, GPE, HASHTAG, EMOJI, MENTION, X/Other) in Roman/Devanagari scripts. Achieves 94.90 F1 on COMI-LINGUA test set (5K instances), outperforming the zero-shot inference (59.88 F1).

Code-Mixing

Hinglish

0
8
979.66 MB
175

Updated 3 month(s) ago

IITGN

View Details

Ganga-2-1B

The first pre-trained Hindi model by any academic research lab in India 🇮🇳!

Text Generation

1
44
1.88 GB
533

Updated 11 month(s) ago

IITGN

View Details

Accessibility options by UX4G

COMI-LINGUA-MT

About Model

Hindi-English Translation Model

Supported Target Formats

Model Overview

Performance

Example Inference

Citation

COMI-LINGUA-MT

Metadata

adapter_config.json ( 916 Bytes )

Activity Overview

Tags

License Control

Version Control

Version 1(1.89 GB)

adapter_config.json

adapter_model.safetensors

chat_template.jinja

optimizer.pt

README.md

rng_state.pth

scaler.pt

scheduler.pt

special_tokens_map.json

More Models from IITGN

AIKosh

Resources

Support