ORGANISATION

Nagamese Speech-to-Text

Automatic Speech Recognition (ASR) model for Nagamese speech, designed to transcribe spoken Nagamese into text for real-world usage.

About Model

This is an Automatic Speech Recognition (ASR) model for Nagamese, a widely spoken creole language of Northeast India. The model processes 16 kHz audio and generates text transcriptions reflecting natural, conversational Nagamese speech. The system is built on the Whisper-Small architecture and adapted specifically for Nagamese using real speech recordings. The model supports informal speech patterns, fillers, repetitions, and everyday vocabulary commonly used by Nagamese speakers. To improve fluency and transcription stability, the model was further refined using controlled synthetic speech data, while evaluation and validation were consistently performed on real Nagamese speech. This model is intended for: speech-to-text applications accessibility tools language technology research prototyping conversational and voice-enabled systems in Nagamese

Nagamese Speech-to-Text

Metadata

License

Attribution 4.0 International (CC BY- 4.0)

Hosted By

MWirelabs

Task Type

Transformers

Model Format

PyTorch

Visibility

Open

Source Organisation

MWire Labs

Sector

Social

Updated Date & Time

23/01/26 13:03:49

Created By

Badal Nyalang

Size

Activity Overview

License Control

Attribution 4.0 International (CC BY- 4.0)

Related Datasets

More Models from MWire Labs

Mizo OCR - Text Recognition for Mizo Language

OCR model for the Mizo language achieving 90.68% character accuracy on synthetic and curated printed text

Image-to-Text

trocr

Mizo

northeast-india

low-resource

OCR

Updated 3 month(s) ago

MWIRE LABS

View Details

NE-OCR

NE-OCR is a multilingual Optical Character Recognition model developed by MWire Labs to accurately recognize printed text from documents in Northeast Indian languages. The model supports Assamese, Bodo, English, Garo, Hindi, Khasi, Kokborok, Meitei (Bengali script), Meitei (Meitei Mayek script), Mizo, Nagamese, and Nyishi. It is designed to enable reliable digitization of books, newspapers, government records, educational materials, and cultural archives from Northeast India where mainstream OCR

Nagamese

OCR

BODO

Optical Character Recognition

Mizo

khasi

northeast-india

doctr

Garo

Kokborok

Meitei

Nyishi

Printed Text Recognition

Northeast India OCR

Multilingual OCR

vitstr

Updated 3 month(s) ago

MWIRE LABS

View Details

Nagamese Speech-to-Text

Automatic Speech Recognition (ASR) model for Nagamese speech, designed to transcribe spoken Nagamese into text for real-world usage.

ASR

Speech Recognition

Nagamese

whisper

low-resource-language

Automatic Speech Recognition

Updated 3 month(s) ago

MWIRE LABS

View Details

Garo OCR - Text Recognition for Garo

OCR model for the Garo language achieving 93.13% character accuracy.

Image-to-Text

OCR

Garo

northeast-india

florence-2

Updated 3 month(s) ago

MWIRE LABS

View Details

Northeast Language Identification

NE-LID is a fast and accurate language identification model for Northeast Indian languages using character level features. It is designed for low resource and script diverse text and achieves high accuracy on short sentences.

fastText

fasttext

MWire Labs

northeast-india

low-resource

language identification

Multilingual

Updated 5 month(s) ago

MWIRE LABS

View Details

NortheastNER

NortheastNER is a token classification model built on XLM-RoBERTa and fine-tuned on ~25k sentences from gazetteers, news, and cultural texts across Northeast India. It detects region-specific entities, places, tribes, festivals, tourist sites, flora, fauna, and experimental local names; ideal for low-resource NER, regional search, cultural analytics, and knowledge graph applications.

XLM-RoBERTa

northeast-india

Meghalaya

Northeast India

Conservation

Token Classification

NER

low-resource

Updated 7 month(s) ago

MWIRE LABS

View Details

Kren-M

Northeast India's first AI language model. Kren-M is a 2.6B parameter bilingual model for Khasi-English, built on Gemma-2-2B. Features Kren-NE custom tokenizer covering 7 NE languages (Khasi, Garo, Mizo, Assamese, Manipuri, Nagamese, Nyishi) with 35.7% efficiency gain. Trained on 5.43M Khasi sentences. Capabilities: bidirectional translation, natural conversation, cultural context. Designed for language preservation across Northeast India

Kren-M

bilingual

continued-pretraining

Northeast India

khasi

northeast-india

Garo

low-resource

Instruction-Tuning

Indian Languages

Tokenizer

Foundational model

Northeast India Languages

Updated 7 month(s) ago

MWIRE LABS

View Details

NE-BERT

NE-BERT is Northeast India's first domain-specific multilingual foundation model. Built on the ModernBERT architecture and trained on 8.3 million sentences, it supports 9 regional languages: Assamese, Khasi, Garo, Manipuri (Meitei), Mizo, Nyishi, Nagamese, Kokborok, and Pnar. It achieves State-of-the-Art performance on regional benchmarks and offers 1.6x faster inference, bridging the digital divide for low-resource languages.

Mizo

Masked Language Modeling

low-resource-NLP

Assamese

Garo

Nyishi

Meitei

Nagamese

northeast-india

khasi

A'chik

modernbert

mwirelabs

northeast bert

token-efficiency

kokborok

Pnar

Updated 7 month(s) ago

MWIRE LABS

View Details

KhasiBERT

Khasi language model trained on 3.6M sentences using RoBERTa architecture. 110M parameters. Supports NLP tasks for Khasi text processing.

digital-india

endpoints_compatible

autotrain_compatible

Fill-Mask

Indian Language

low-resource

region:us

kha

khasi

Meghalaya

austroasiatic

masked-lm

roberta

foundational-model

Bert

safetensors

1
31
0
1,040

Updated 9 month(s) ago

MWIRE LABS

View Details

Khasi English Semantic Search Model

Khasi-English semantic search model, trained on 66,794 pairs with 0.69-0.74 similarity. ~90MB, supports Meghalaya tourism/culture. By MWirelabs

khasi-culture

text-embeddings-inference

khasi

kha

license:cc0-1.0

sentence-transformers

semantic search

autotrain_compatible

cross-lingual

Sentence Similarity

safetensors

Meghalaya

Updated 10 month(s) ago

MWIRE LABS

View Details

Accessibility options by UX4G

Nagamese Speech-to-Text

About Model

Nagamese Speech-to-Text

Metadata

Activity Overview

Tags

License Control

Related Datasets

More Models from MWire Labs

AIKosh

Resources

Support