Home/Model Tryout/Bhashini-AI4Bharat Textual Language Detection v1.0

ORGANISATION

Bhashini-AI4Bharat Textual Language Detection v1.0

Detect language from provided text, Currently supports 23 languages (English, Bangla, Manipuri, Bodo, Konkani, Oriya, Nepali, Marathi, Sindhi, Sanskrit, Malayalam, Urdu, Assamese, Telugu, Dogri, Gujarati, Kashmiri, Punjabi, Santali, Maithili, Hindi, Tamil, Kannada)

5
270
3 MB
5,100

Model Card

Run Model

About Model

IndicLID, is a language identifier for all 22 Indian languages listed in the Indian constitution in both native-script and romanized text. IndicLID is the first LID for romanized text in Indian languages. It is a two stage classifier that is ensemble of a fast linear classifier and a slower classifier finetuned from a pre-trained LM. It can predict 47 classes (24 native-script classes and 21 roman-script classes plus English and Others). IndicLID is evaluated on Bhasha-Abhijnaanam benchmark which is released alnog with this work. For native-script text, IndicLID has better language coverage than existing LIDs and is competitive or better than other LIDs. IndicLID model is 10 times faster and 4 times smaller than the NLLB model also establish a strong baseline results on the roman-script text.

Bhashini-AI4Bharat Textual Language Detection v1.0

Metadata

License

MIT

Hosted By

AI4Bharat

Task Type

OCR (Optical Character Recognition) Model

Model Format

Other

Visibility

Open

Source Organisation

Digital India BHASHINI Division

Sector

Sector Agnostic

Updated Date & Time

06/07/26 16:08:10

Created By

Shailendra Pal Singh

Size

3 MB

compile_final_pilot_1.py ( 1.81 KB )

To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.

License Control

MIT

Version Control

Version 2(3 MB)

admin·1 year(s) ago
- Benchmark
  compile_final_pilot_1.py
  create_benchmark_extra.py
  create_benchmark.py
- deployement
- filter_Dakshina
- final_runs_ACL_inference
- final_runs_train
- Inference
- nueral_net
- preprocess_indiccorp
- README.md

Version 1(4.91 KB)

admin·1 year(s) ago

No File(s) Found!

More Models from Digital India BHASHINI Division

IndicXlit

A Transformer-based multilingual transliteration model

Indian Languages

transliteration

Regional Languages

Machine Translation

Multilingual Translation

Language Modeling

NLP

0
49
3.94 MB
1,157

Updated 6 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Indic Trans2

AI4Bharat's Indic-Trans-v2 is a multilingual Transformer (~1.1BM) NMT model trained on Samanantar v2 dataset which is the largest publicly available parallel corpora collection for languages of India at the time of writing (23 March 2023). We currently release two models - Indic to English and English to Indic and support all the 22 scheduled languages of India.

Machine Translation

Computational Linguistics

Indian Languages

Indic-TransV2

NLP

Regional Languages

Machine Translation

Multilingual Translation

Bilingual Translation

Language Modeling

1
85
214.60 KB
2,252

Updated 6 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Bhashini - Fastspeech2 Model using (HS)

Text-to-speech models trained using FastPitch and HiFi-GAN vocoder, separately for each language. Supports both 'female' and 'male' voices.

Text to Speech

Multilingual

Language Detection

Transformer

Text Processing

NLP

0
99
286.72 MB
1,828

Updated 6 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Bhashini - IndicNER

IndicNER is a multilingual Named Entity Recognition model fine-tuned on 11 Indian languages to identify named entities in text

Bert

Samanantar

Pytorch

Token Classification

Transformer

NLP

Foreigners

Multilingual

NER

2
142
591.28 MB
2,690

Updated 6 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Bhashini-AI4Bharat Textual Language Detection v1.0

Bhashini

Text Language Detection

Transformer

Deep Learning

Text Processing

NLP

AI4Bharat

Multilingual

5
270
3 MB
5,100

Updated 6 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-SANSKRIT

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text. The inference code, installation requirements, and usage instructions are available in the SPRING Lab, IIT Madras GitHub repository: https://github.com/Speech-Lab-IITM/Fairseq-Inference

low-resource-language

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

ssl

Sanskrit

0
5
3.52 GB
197

Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-PUNJABI

low-resource-language

SSL_finetunning

Data2vec_aqc

PUNJABI

spring_lab

IITM

ssl

0
3
3.52 GB
189

Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-ODIA

low-resource-language

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

ssl

Odia

0
5
3.52 GB
161

Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-MALAYALAM

low-resource-language

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

malayalam

ssl

0
5
3.52 GB
205

Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-MARATHI

low-resource-language

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

ssl

Marathi

0
6
3.52 GB
152

Updated 11 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Accessibility options by UX4G

Bhashini-AI4Bharat Textual Language Detection v1.0

About Model

Bhashini-AI4Bharat Textual Language Detection v1.0

Metadata

Tags

compile_final_pilot_1.py ( 1.81 KB )

License Control

Version Control

Version 2(3 MB)

Benchmark

compile_final_pilot_1.py

create_benchmark_extra.py

create_benchmark.py

deployement

filter_Dakshina

final_runs_ACL_inference

final_runs_train

Inference

nueral_net

preprocess_indiccorp

README.md

Version 1(4.91 KB)

More Models from Digital India BHASHINI Division

AIKosh

Resources

Support