Indian Flag
Government Of India
A-
A
A+

Bhashini - IndicNER

IndicNER is a multilingual Named Entity Recognition model fine-tuned on 11 Indian languages to identify named entities in text

About Model

IndicNER is a state-of-the-art multilingual Named Entity Recognition (NER) model developed by Bhashini. It is designed to recognize and classify named entities such as names of persons, organizations, locations, dates, and more from text in 11 Indian languages:
Hindi, Bengali, Tamil, Telugu, Gujarati, Punjabi, Marathi, Assamese, Kannada, Malayalam and Oriya.

Training Dataset:
The model is fine-tuned using a large corpus derived from publicly available Indian NER datasets and human-annotated test sets, ensuring high accuracy across different languages. Additionally, it has been trained on data sourced from the Samanantar Corpus, India's largest parallel corpus, to enhance its contextual understanding. The base model used for fine-tuning is BERT-base-multilingual-uncased, which allows it to capture linguistic nuances effectively.

Use Cases:
IndicNER can be used for a wide range of Natural Language Processing (NLP) applications, including:

1. Automated document processing – Extracting key entities from government, legal, and business documents.
2. Chatbots and virtual assistants – Enhancing conversational AI by identifying user queries related to people, places, and organizations.
3. News and content analysis – Automatically tagging and categorizing entities in multilingual news articles.
4. Healthcare and medical records – Identifying patient details and medical terms for structured data extraction.

For more details and implementation, visit: https://huggingface.co/ai4bharat/IndicNER.



Bhashini - IndicNER

Metadata Metadata

MIT

AI4Bharat

Named Entity Recognition (NER) Model

Other

Open

Sector Agnostic

05/03/25 15:23:12

Admin

591.28 MB

Activity Overview Activity Overview

  • Downloads1
  • Downloads 101
  • Views 2,162
  • File Size 591.12 MB

Tags Tags

  • Multilingual
  • Foreigners
  • NLP
  • Transformer
  • Token Classification
  • Pytorch
  • Samanantar
  • Bert
  • NER

License Control License Control

MIT

Version Control Version Control

FolderVersion 2(591.12 MB)
  • admin·1 year(s) ago
    • application/json
      config.json
    • undefined
      pytorch_model.bin
    • text/markdown
      README.md
    • application/json
      special_tokens_map.json
    • application/json
      tokenizer_config.json
    • application/json
      tokenizer.json
    • text/plain
      vocab.txt

More Models from TechCorp More Models from TechCorp

SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned
SANTHAM-Gemma3-4B-SH-Seg-Poetry-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Segmented text obtained using Sanskrit Heritage segmenter.
translation
poetry
santham
Segmened
language:tam
language:san
  • See Upvoters0
  • Downloads0
  • File Size115.62 MB
  • Views30
Updated 7 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SANTHAM-Gemma3-4B-Finetuned
SANTHAM-Gemma3-4B-Finetuned is a Sanskrit → Tamil translation model built on the Gemma 3 (4B) architecture. It is trained on a parallel corpus developed as part of the Sanskrit Knowledge Accessor project, enabling it to capture linguistic nuances and generate fluent Tamil translations from classical Sanskrit inputs.
translation
language:san
language:tam
santham
  • See Upvoters0
  • Downloads2
  • File Size2.08 GB
  • Views46
Updated 7 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SANTHAM-Gemma3-4B-Anvaya-Poetry-Finetuned
SANTHAM-Gemma3-4B-Anvaya-Potery-Finetuned is a model designed to translate Sanskrit into Tamil specialized on Anvaya translation in Poetry.
poetry
santham
anvaya
language:tam
language:san
translation
  • See Upvoters0
  • Downloads0
  • File Size2.09 GB
  • Views21
Updated 7 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-URDU
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
Data2vec_aqc
low-resource-language
SSL_finetunning
ssl
urdu
IITM
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views49
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-TELUGU
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
low-resource-language
SSL_finetunning
Data2vec_aqc
IITM
telugu
ssl
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views35
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-TAMIL
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
tamil
ssl
  • See Upvoters0
  • Downloads1
  • File Size3.52 GB
  • Views36
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BENGALI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Data2vec_aqc
IITM
spring_lab
ssl
low-resource-languages
SSL_finetunning
bengali
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views80
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BODO
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
spring_lab
SSL_finetunning
low-resource-language
BODO
Data2vec_aqc
ssl
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views110
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BHOJPURI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
ssl
Bhojpuri
low-resource-language
  • See Upvoters0
  • Downloads4
  • File Size3.52 GB
  • Views94
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MALAYALAM
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
ssl
malayalam
IITM
spring_lab
Data2vec_aqc
low-resource-language
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views106
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION