Indian Flag
Government Of India
A-
A
A+
Kinnauri-Pahari - Parallel Monolingual Dataset

Kinnauri-Pahari - Parallel Monolingual Dataset

The Kinnauri-Pahari dataset is a collection of general-domain corpora for one of seven endangered languages found in Himachal Pradesh, India. It contains both the monolingual and parallel sentences

About Dataset

This dataset is designed to facilitate the comparison and analysis of Hindi and Kinnauri languages. The dataset contains text in both languages, they have both monolingual dataset that contain kinnauri language and parallel corpus which has hindi and kinnauri translated sentences. The dataset provides a unique comparison of Hindi and Kinnauri languages, showcasing their linguistic differences through texts and translations. The dataset is likely intended for linguistic research, language learning, and cultural exchange purposes. The citation of the dataset - Saxena, Shefali, Shweta Chauhan, and Philemon Daniel. \"Kinnauri-Pahari (version_0. 1): parallel, monolingual dataset and word-embeddings.\" Sādhanā 47, no. 3 (2022): 123. This dataset was identified and facilitated for onboarding as part of the Dataset Onboarding Support Team (DOST) initiative led by by CivicDataLab (CDL), partnering with the Gates Foundation in collaboration with BHASHINI. CivicDataLab provided technical support for dataset discovery, validation, metadata preparation and onboarding facilitation. All dataset ownership and intellectual property rights remain with the original author(s).

Purpose of Dataset

The Purpose Of This Dataset Is To Facilitate The Comparison, Analysis, And Preservation Of Hindi And Kinnauri Languages Through Bilingual Text Data. The Dataset Enables Researchers And Scholars To Study The Linguistic Similarities And Differences Between The Two Languages, Including Vocabulary, Grammar, Syntax, And Regional Language Patterns. It Is Particularly Relevant For Linguistic Research, Language Learning, Cultural Studies, And The Development Of Multilingual Language Technologies For Low-resource Himalayan Languages. The Dataset Also Supports The Creation Of Educational Resources, Translation Systems, And Language Learning Tools While Helping Preserve The Linguistic And Cultural Heritage Of The Indian Himalayan Region.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 1
  • File Size 9.98 MB
  • Views 8

Tags Tags

  • Multilingual Text
  • Low Resource Languages
  • Multilingual NLP
  • Hindi Language
  • Kinnauri Language
  • Himachal Pradesh
  • Language Translation

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(9.98 MB)
  • Nikil Augustine·4 day(s) ago
    • text/csv
      hindi_kinnauri(1).csv
    • text/csv
      Monolingual_KP.csv