This is a fine-tuned version of Llama-3.1-8B-Instruct for Machine Translation (MT) on Hinglish (Hindi-English code-mixed) text. It translates code-mixed input in Roman/Devanagari scripts to three target formats: (i) Standard English, (ii) Romanized Hindi, and (iii) Devanagari Hindi.
A LoRA-adapted Transformer LLM fine-tuned for Hinglish-to-monolingual translation preserving natural fluency and code-mixing nuances. Translates code-mixed input (Roman + Devanagari scripts) into three target formats: (i) Standard English, (ii) Romanized Hindi, (iii) Devanagari Hindi.
meta-llama/Llama-3.1-8B-InstructAchieves strong gains across all three target formats on the COMI-LINGUA MT test set (5K instances), setting new benchmarks for Hinglish-to-monolingual translation among open-weight models. Significantly outperforms zero-shot and one-shot prompting of the same base model and several larger/closed-weight LLMs.
| Setting | Model | Target Language | BLEU | chrF++ |
|---|---|---|---|---|
| Zero-shot | Llama-3.1-8B-Instruct | Standard English | 38.3 | 67.5 |
| Zero-shot | Llama-3.1-8B-Instruct | Romanized Hindi | 15.6 | 49.2 |
| Zero-shot | Llama-3.1-8B-Instruct | Devanagari Hindi | 7.4 | 13.5 |
| One-shot | Llama-3.1-8B-Instruct | Standard English | 45.8 | 72.4 |
| One-shot | Llama-3.1-8B-Instruct | Romanized Hindi | 35.3 | 67.0 |
| One-shot | Llama-3.1-8B-Instruct | Devanagari Hindi | 17.9 | 53.2 |
| Fine-tuned | Llama-3.1-8B-Instruct | Standard English | 56.1 | 78.7 |
| Fine-tuned | Llama-3.1-8B-Instruct | Romanized Hindi | 66.6 | 85.9 |
| Fine-tuned | Llama-3.1-8B-Instruct | Devanagari Hindi | 73.5 | 86.2 |
Translate the following Hinglish sentence into Standard English, Romanized Hindi, and Devanagari Hindi:
लंदन के Madame Tussauds में Deepika Padukone के wax statue का गुरुवार को अनावरण हुआ।
Output:
Standard English: Deepika Padukone's wax statue was unveiled at Madame Tussauds in London on Thursday. Romanized Hindi: London ke Madame Tussauds mein Deepika Padukone ke wax statue ka guruvaar ko anavaran hua. Devanagari Hindi: लंदन के मैडम तुसाद में दीपिका पादुकोण के वैक्स स्टैच्यू का गुरुवार को अनावरण हुआ।
@inproceedings{sheth-etal-2025-comi,
title = "{COMI}-{LINGUA}: Expert Annotated Large-Scale Dataset for Multitask {NLP} in {H}indi-{E}nglish Code-Mixing",
author = "Sheth, Rajvee and
Beniwal, Himanshu and
Singh, Mayank",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.422/",
pages = "7973--7992",
isbn = "979-8-89176-335-7"
}Apache 2.0
Rajvee Sheth, Mayank Singh
Transformers
Transformers
Open
Science, Technology and Research
10/02/26 11:05:28
1.89 GB
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.