Indian Flag
Government Of India
A-
A
A+
bhasha‑wiki-Gujrati

bhasha‑wiki-Gujrati

Gujarati is written in the Gujarati script and is widely spoken in the western Indian state of Gujarat

About Dataset

Gujarati is written in the Gujarati script and is widely spoken in the western Indian state of Gujarat, as well as among global diaspora communities. Its presence in the soketlabs/bhasha-wiki dataset supports the development of AI systems that understand regional business, trade, and cultural narratives. The language’s economic significance and literary contributions make it essential for comprehensive multilingual model training.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads2
  • Redirect 14
  • Views 300
  • File Size 0

Tags Tags

  • Gujarati
  • multilingual NLP
  • indicnlp
  • natural language processing (NLP)
  • language-diversity
  • language

License Control License Control

Attribution 4.0 International (CC BY- 4.0)