ORGANISATION

Bhashini - IndicNER

IndicNER is a multilingual Named Entity Recognition model fine-tuned on 11 Indian languages to identify named entities in text

Digital India BHASHINI Division
BHASHINI_shailendra

About Model

IndicNER is a state-of-the-art multilingual Named Entity Recognition (NER) model developed by Bhashini. It is designed to recognize and classify named entities such as names of persons, organizations, locations, dates, and more from text in 11 Indian languages:

Hindi, Bengali, Tamil, Telugu, Gujarati, Punjabi, Marathi, Assamese, Kannada, Malayalam and Oriya.

Training Dataset:

The model is fine-tuned using a large corpus derived from publicly available Indian NER datasets and human-annotated test sets, ensuring high accuracy across different languages. Additionally, it has been trained on data sourced from the Samanantar Corpus, India's largest parallel corpus, to enhance its contextual understanding. The base model used for fine-tuning is BERT-base-multilingual-uncased, which allows it to capture linguistic nuances effectively.

Use Cases:

IndicNER can be used for a wide range of Natural Language Processing (NLP) applications, including:

1. Automated document processing – Extracting key entities from government, legal, and business documents.

2. Chatbots and virtual assistants – Enhancing conversational AI by identifying user queries related to people, places, and organizations.

3. News and content analysis – Automatically tagging and categorizing entities in multilingual news articles.

4. Healthcare and medical records – Identifying patient details and medical terms for structured data extraction.

For more details and implementation, visit: https://huggingface.co/ai4bharat/IndicNER.

Bhashini - IndicNER

Metadata

License

MIT

Hosted By

AI4Bharat

Model Type

Named Entity Recognition (NER) Model

Model Format

Other

Visibility

Open

Source Organisation

Digital India BHASHINI Division

Sector

Sector Agnostic

Updated Date & Time

05/03/25 15:23:12

Created By

Admin

Size

591.28 MB

config.json ( 1.16 KB )

To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.

Activity Overview

1
125
591.12 MB
2,558

License Control

MIT

Version Control

Version 2(591.12 MB)

admin·1 year(s) ago
- config.json
- pytorch_model.bin
- README.md
- special_tokens_map.json
- tokenizer_config.json
- tokenizer.json
- vocab.txt

Version 1(162.42 KB)

admin·1 year(s) ago

No File(s) Found!

More Models from TechCorp

SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned

SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Segmented text obtained using Sanskrit Heritage segmenter.

translation

poetry

santham

Segmened

language:tam

language:san

0
17
115.62 MB
119

Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SANTHAM-Gemma3-4B-Finetuned

SANTHAM-Gemma3-4B-Finetuned is a Sanskrit → Tamil translation model built on the Gemma 3 (4B) architecture. It is trained on a parallel corpus developed as part of the Sanskrit Knowledge Accessor project, enabling it to capture linguistic nuances and generate fluent Tamil translations from classical Sanskrit inputs.

translation

language:san

language:tam

santham

0
13
2.08 GB
197

Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SANTHAM-Gemma3-4B-Anvaya-Poetry-Finetuned

SANTHAM-Gemma3-4B-Anvaya-Potery-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Anvaya translation in Poetry.

poetry

santham

anvaya

language:tam

language:san

translation

0
13
2.09 GB
138

Updated 3 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-URDU

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

spring_lab

Data2vec_aqc

low-resource-language

SSL_finetunning

ssl

urdu

IITM

0
3
3.52 GB
126

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-TELUGU

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

spring_lab

low-resource-language

SSL_finetunning

Data2vec_aqc

IITM

telugu

ssl

0
3
3.52 GB
113

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-TAMIL

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

low-resource-language

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

tamil

ssl

0
4
3.52 GB
112

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-BENGALI

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

IITM

ssl

bengali

low-resource-languages

spring_lab

Data2vec_aqc

SSL_finetunning

0
5
3.52 GB
156

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-BODO

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

IITM

spring_lab

Data2vec_aqc

SSL_finetunning

low-resource-language

BODO

ssl

0
3
3.52 GB
177

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-BHOJPURI

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

SSL_finetunning

Data2vec_aqc

spring_lab

IITM

ssl

Bhojpuri

low-resource-language

0
7
3.52 GB
171

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

SPRING-INX-DATA2VEC-AQC-MALAYALAM

Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.

SSL_finetunning

ssl

malayalam

IITM

spring_lab

Data2vec_aqc

low-resource-language

0
5
3.52 GB
185

Updated 4 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

View Details

Accessibility options by UX4G

Bhashini - IndicNER

About Model

Bhashini - IndicNER

Metadata

config.json ( 1.16 KB )

Activity Overview

Tags

License Control

Version Control

Version 2(591.12 MB)

config.json

pytorch_model.bin

README.md

special_tokens_map.json

tokenizer_config.json

tokenizer.json

vocab.txt

Version 1(162.42 KB)

More Models from TechCorp

AIKosh

Resources

Support