Indian Flag
Government Of India
A-
A
A+

Indic Trans2

AI4Bharat's Indic-Trans-v2 is a multilingual Transformer (~1.1BM) NMT model trained on Samanantar v2 dataset which is the largest publicly available parallel corpora collection for languages of India at the time of writing (23 March 2023). We currently release two models - Indic to English and English to Indic and support all the 22 scheduled languages of India.

About Model

Bhashini - IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages — including multiple scripts for low-resouce languages like Kashmiri, Manipuri and Sindhi. It adopts script unification wherever feasible to leverage transfer learning by lexical sharing between languages. Overall, the model supports five scripts Perso-Arabic (Kashmiri, Sindhi, Urdu), Ol Chiki (Santali), Meitei (Manipuri), Latin (English), and Devanagari (used for all the remaining languages).

We open-souce all our training dataset (BPCC), back-translation data (BPCC-BT), final IndicTrans2 models, evaluation benchmarks (IN22, which includes IN22-Gen and IN22-Conv) and training and inference scripts for easier use and adoption within the research community. We hope that this will foster even more research in low-resource Indic languages, leading to further improvements in the quality of low-resource translation through contributions from the research community.

This code repository contains instructions for downloading the artifacts associated with IndicTrans2, as well as the code for training/fine-tuning the multilingual NMT models.

For more details about the use of model, refer to github: https://github.com/AI4Bharat/IndicTrans2/tree/main

Indic Trans2

Metadata Metadata

MIT

AI4Bharat

Machine Translation Model

Other

Open

Sector Agnostic

05/03/25 15:24:29

Admin

214.60 KB

Activity Overview Activity Overview

  • Downloads0
  • Downloads 58
  • Views 990
  • File Size 214.60 KB

Tags Tags

  • Machine Translation
  • Computational Linguistics
  • Language Modeling
  • Bilingual Translation
  • Multilingual Translation
  • Machine Translation
  • Regional Languages
  • Indian Languages
  • Indic-TransV2
  • NLP

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(214.60 KB)
  • admin·1 year(s) ago
    • chevron_rightFolder
      IndicTrans2-main
      • chevron_rightFolder
        baseline_eval
      • chevron_rightFolder
        huggingface_interface
      • undefined
        .gitignore
      • undefined
        apply_sentence_piece.sh
      • undefined
        compute_comet_score.sh
      • undefined
        compute_metrics_significance.sh
      • undefined
        compute_metrics.sh
      • undefined
        eval_rev.sh
      • undefined
        eval.sh
      • undefined
        finetune.sh
      • more_horiz 15 more

More Models from TechCorp More Models from TechCorp

SPRING-INX-DATA2VEC-AQC-BENGALI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
spring_lab
Data2vec_aqc
ssl
low-resource-languages
SSL_finetunning
bengali
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views41
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BODO
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Data2vec_aqc
ssl
IITM
spring_lab
SSL_finetunning
low-resource-language
BODO
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views46
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BHOJPURI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
Bhojpuri
ssl
IITM
spring_lab
Data2vec_aqc
low-resource-language
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views30
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MALAYALAM
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
malayalam
IITM
spring_lab
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views24
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-KANNADA
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
low-resource-language
SSL_finetunning
Data2vec_aqc
kannada
spring_lab
ssl
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views36
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MARATHI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Marathi
low-resource-language
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
ssl
  • See Upvoters0
  • Downloads1
  • File Size3.52 GB
  • Views43
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-SANSKRIT
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
IITM
spring_lab
Sanskrit
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views33
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-PUNJABI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
IITM
spring_lab
PUNJABI
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views30
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-ODIA
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
Odia
ssl
IITM
Data2vec_aqc
SSL_finetunning
low-resource-language
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views34
Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING LAB TAMIL-STREAMING
Automatic Speech Recognition (ASR) model for Tamil speech recognition, processing audio and transcribing spoken content into text.
Icefall-K2
ASR
tamil
IITM
spring_lab
streaming
MODELS
zipformer
  • See Upvoters0
  • Downloads7
  • File Size260.42 MB
  • Views128
Updated 26 day(s) ago

DIGITAL INDIA BHASHINI DIVISION