Indian Flag
Government Of India
A-
A
A+
bhasha‑wiki-Hindi

bhasha‑wiki-Hindi

Hindi is written in the Devanagari script and is the most widely spoken language in India,

About Dataset

Hindi is written in the Devanagari script and is the most widely spoken language in India, primarily used across the northern and central regions. Its inclusion in the soketlabs/bhasha-wiki dataset ensures that language models can learn from and cater to a broad user base, covering general knowledge, government communication, education, and popular media. With its deep cultural significance and widespread usage, Hindi is a cornerstone for building Indic-aware AI systems.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 24
  • Views 146
  • File Size 0

Tags Tags

  • cross-lingual NLP
  • multilingual NLP
  • indicnlp
  • natural language processing (NLP)
  • Hindi
  • language research

License Control License Control

Attribution 4.0 International (CC BY- 4.0)