Indian Flag
Government Of India
A-
A
A+
Northeast India Tribes and Subtribes

Northeast India Tribes and Subtribes

Dataset of tribes and sub-tribes across Northeast India with details on regions, clans, languages, and linguistic families.

About Dataset

This dataset provides a structured compilation of tribes and sub-tribes across the eight states of Northeast India: Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Tripura, and Sikkim. Each entry includes the state, tribe, sub-tribes or clans, regional distribution, languages spoken, and linguistic family classification. The dataset has been compiled from publicly available sources such as the Census of India, Ministry of Tribal Affairs documents, ethnographic studies, and community references. It is intended as a reference resource for cultural studies, linguistic analysis, education, and computational applications including natural language processing.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 9
  • Views 281
  • File Size 13.99 KB

Tags Tags

  • northeast-india
  • Meghalaya
  • Arunachal Prdesh
  • Sikkim
  • Nagaland
  • Tripura
  • Mizoram
  • Manipur

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(13.99 KB)
  • admin·5 month(s) ago
    • text/csv
      Northeast_Tribes_with_Linguistic_Families.csv

Related Models Related Models

NortheastNER
NortheastNER is a token classification model built on XLM-RoBERTa and fine-tuned on ~25k sentences from gazetteers, news, and cultural texts across Northeast India. It detects region-specific entities, places, tribes, festivals, tourist sites, flora, fauna, and experimental local names; ideal for low-resource NER, regional search, cultural analytics, and knowledge graph applications.
Token Classification
NER
northeast-india
low-resource
XLM-RoBERTa
Meghalaya
Conservation
Northeast India
  • See Upvoters0
  • Downloads9
  • File Size0
  • Views113
Updated 2 month(s) ago

MWIRE LABS

Kren v1: Khasi Generative Language Model
Kren v1 is the first Khasi generative language model, trained on 1M lines, pioneering encoder-to-decoder adaptation for low-resource AI.
MWire Labs
khasi
Low-Resource NLP
khasi-culture
Indigenous Language
Northeast India
Encoder-to-Decoder
AI for Culture
Natural Language Processing
Meghalaya
  • See Upvoters0
  • Downloads0
  • File Size390.67 MB
  • Views8
Updated 5 month(s) ago

MWIRE LABS