Indian Flag
Government Of India
A-
A
A+

AI4Bharat Textual Language Detection

Detect language from provided text, Currently supports 22 languages

About Model

IndicLID, is a language identifier for all 22 Indian languages listed in the Indian constitution in both native-script and romanized text. IndicLID is the first LID for romanized text in Indian languages. It is a two stage classifier that is ensemble of a fast linear classifier and a slower classifier finetuned from a pre-trained LM. It can predict 47 classes (24 native-script classes and 21 roman-script classes plus English and Others). IndicLID is evaluated on Bhasha-Abhijnaanam benchmark which is released alnog with this work. For native-script text, IndicLID has better language coverage than existing LIDs and is competitive or better than other LIDs. IndicLID model is 10 times faster and 4 times smaller than the NLLB model also establish a strong baseline results on the roman-script text.

AI4Bharat Textual Language Detection

Metadata Metadata

MIT

AI4Bharat

Text Language Detection

N.A.

Open

AI4Bharat

Sector Agnostic

21/02/25 13:21:38

0

Activity Overview Activity Overview

  • Downloads0
  • Redirect 24
  • Views 497
  • File Size 0

Tags Tags

  • Text Language Detection
  • Bhashini
  • AI4Bharat
  • Deep Learning
  • NLP
  • Multilingual
  • Text Processing

License Control License Control

MIT

More Models from AI4Bharat More Models from AI4Bharat

AI4Bharat- 500 M - RomanSetu Multilingual Native-to-Roman Model
RomanSetu is a multilingual continual pretrained transformer model designed for transliteration across six Indic languages
Instruction-Tuning
LLaMA2
Multilingual
Llama
  • See Upvoters1
  • Downloads30
  • File Size0
  • Views441
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat- 400 M - RomanSetu Multilingual Native-to-Roman Model
RomanSetu is a multilingual continual pretrained transformer model designed for transliteration across six Indic languages
Multilingual
Llama
Instruction-Tuning
LLaMA2
  • See Upvoters1
  • Downloads53
  • File Size0
  • Views563
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat- Maithili - IndicConformer Automatic Speech Recognition (ASR) Model
This model takes in mono-channel audio files at a 16,000 Hz sampling rate (WAV format) and outputs the transcribed text of the speech contained in the audio.
Automatic Speech Recognition
Speech-to-Text
NLP
  • See Upvoters0
  • Downloads15
  • File Size0
  • Views337
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat- Konkani - IndicConformer Automatic Speech Recognition (ASR) Model
Automatic Speech Recognition (ASR) model for Konkani speech recognition, processing 16,000 KHz mono WAV audio and transcribing spoken content into text
Speech-to-Text
NLP
Automatic Speech Recognition
  • See Upvoters0
  • Downloads21
  • File Size0
  • Views402
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat- Kashmiri - IndicConformer Automatic Speech Recognition (ASR) Model
This Automatic Speech Recognition (ASR) model transcribes Kashmiri speech from 16,000 KHz mono WAV audio files into text
Kashmiri
Speech-to-Text
NLP
Automatic Speech Recognition
  • See Upvoters0
  • Downloads14
  • File Size0
  • Views390
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat - Romansetu-200M -Multilingual LLM for Indian langauges using romanization
RomanSetu is Efficiently unlocking multilingual (Indian Languages) capabilities of Large Language Models via Romanization.
Instruction-Tuning
LLaMA2
Llama
Multilingual
  • See Upvoters0
  • Downloads1
  • File Size0
  • Views144
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat - Romansetu-100M - Multilingual LLM for Indian langauges using romanization
RomanSetu is Efficiently unlocking multilingual (Indian Languages) capabilities of Large Language Models via Romanization.
Llama
Multilingual
Instruction-Tuning
LLaMA2
  • See Upvoters0
  • Downloads6
  • File Size0
  • Views219
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat- Kannada - IndicConformer Automatic Speech Recognition (ASR) Model
This Kannada Automatic Speech Recognition (ASR) model transcribes 16kHz mono-channel audio into text. It utilizes a Conformer-Large architecture with 120M parameters and a hybrid CTC-RNNT decoder for high-accuracy speech recognition.
Automatic Speech Recognition
Audio Processing
NLP
  • See Upvoters0
  • Downloads13
  • File Size0
  • Views341
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat – Romanized Path – Base to Supervised Fine-Tuning (SFT)
Romansetu model is built on base pretrained model which is supervised fine tuned on instuction-following tasks using romanized Indian languages.
LLaMA2
Instruction-Tuning
Multilingual
Llama
  • See Upvoters0
  • Downloads2
  • File Size0
  • Views84
Updated 7 month(s) ago

AI4BHARAT

AI4Bharat-IndicTrans2 Large-1B -English-to-Hindi (Devanagari) – : Language Translation Model
A large-scale neural machine translation (NMT) model for translating English to Hindi (Devanagari) language, leveraging 1 billion parameters for high-quality translations.
Machine Translation
Transformer
low-resource-NLP
high-quality-translation
Large Model
cross-lingual
NLP
Multilingual
  • See Upvoters0
  • Downloads23
  • File Size0
  • Views422
Updated 7 month(s) ago

AI4BHARAT