Indian Flag
Government Of India
A-
A
A+

IndicXlit

A Transformer-based multilingual transliteration model

About Model

Bhashini - IndicXlit is a Transformer-based multilingual transliteration model, trained on Aksharantar dataset which is the largest publicly available parallel transliteration corpora collection for Indic languages at the time of writing (20 May 2022). It is used to convert any roman text written in Indian language (like Hinglish) to the native Indic-script (like Devanagari for Hindi). It supports 21 Indic languages: Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Oriya, Panjabi, Sanskrit, Sindhi, Sinhala, Tamil, Telugu, Urdu.

IndicXlit

Metadata Metadata

MIT

AI4Bharat

Machine Translation Model

Other

Open

Sector Agnostic

05/03/25 15:25:05

Admin

3.94 MB

IndicXlit-master ( 3 files, 12 directories )


Directory
ablation_study

2 directories

Directory
app

10 files, 1 directories

Directory
Checker

3 files

Directory
corpus_preprocessing

5 directories

Directory
data_mining

1 files, 2 directories

Directory
Dataset_Format

2 files

Directory
inference

2 directories

Directory
model_training_scripts

1 files, 7 directories

undefined
.gitignore

1.79 KB

undefined
LICENSE

1.04 KB

This preview shows 10 out of 15 items. Load more

Activity Overview Activity Overview

  • Downloads0
  • Downloads 28
  • File Size 3.94 MB
  • Views 907

Tags Tags

  • Language Modeling
  • Multilingual Translation
  • Machine Translation
  • Regional Languages
  • Indian Languages
  • NLP
  • transliteration

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(3.94 MB)
  • admin·1 year(s) ago
    • chevron_rightFolder
      IndicXlit-master
      • chevron_rightFolder
        ablation_study
      • chevron_rightFolder
        app
      • chevron_rightFolder
        Checker
      • chevron_rightFolder
        corpus_preprocessing
      • chevron_rightFolder
        data_mining
      • chevron_rightFolder
        Dataset_Format
      • chevron_rightFolder
        inference
      • chevron_rightFolder
        model_training_scripts
      • undefined
        .gitignore
      • undefined
        LICENSE
      • more_horiz 5 more

More Models from TechCorp More Models from TechCorp

SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned
SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Segmented text obtained using Sanskrit Heritage segmenter.
translation
poetry
santham
Segmened
language:tam
language:san
  • See Upvoters0
  • Downloads17
  • File Size115.62 MB
  • Views102
Updated 2 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SANTHAM-Gemma3-4B-Finetuned
SANTHAM-Gemma3-4B-Finetuned is a Sanskrit → Tamil translation model built on the Gemma 3 (4B) architecture. It is trained on a parallel corpus developed as part of the Sanskrit Knowledge Accessor project, enabling it to capture linguistic nuances and generate fluent Tamil translations from classical Sanskrit inputs.
translation
language:san
language:tam
santham
  • See Upvoters0
  • Downloads13
  • File Size2.08 GB
  • Views164
Updated 2 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SANTHAM-Gemma3-4B-Anvaya-Poetry-Finetuned
SANTHAM-Gemma3-4B-Anvaya-Potery-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Anvaya translation in Poetry.
poetry
santham
anvaya
language:tam
language:san
translation
  • See Upvoters0
  • Downloads7
  • File Size2.09 GB
  • Views106
Updated 2 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-URDU
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
Data2vec_aqc
low-resource-language
SSL_finetunning
ssl
urdu
IITM
  • See Upvoters0
  • Downloads3
  • File Size3.52 GB
  • Views104
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-TELUGU
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
low-resource-language
SSL_finetunning
Data2vec_aqc
IITM
telugu
ssl
  • See Upvoters0
  • Downloads3
  • File Size3.52 GB
  • Views95
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-TAMIL
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
tamil
ssl
  • See Upvoters0
  • Downloads4
  • File Size3.52 GB
  • Views93
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BENGALI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
ssl
bengali
low-resource-languages
spring_lab
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads5
  • File Size3.52 GB
  • Views134
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BODO
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
spring_lab
Data2vec_aqc
SSL_finetunning
low-resource-language
BODO
ssl
  • See Upvoters0
  • Downloads3
  • File Size3.52 GB
  • Views155
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BHOJPURI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
ssl
Bhojpuri
low-resource-language
  • See Upvoters0
  • Downloads7
  • File Size3.52 GB
  • Views152
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MALAYALAM
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
ssl
malayalam
IITM
spring_lab
Data2vec_aqc
low-resource-language
  • See Upvoters0
  • Downloads5
  • File Size3.52 GB
  • Views154
Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION