Indian Flag
Government Of India
A-
A
A+
bhasha‑wiki-Bengali

bhasha‑wiki-Bengali

Bengali, or Bangla, uses the Bengali script and is predominantly spoken in West Bengal, Tripura, and Bangladesh.

About Dataset

Bengali, or Bangla, uses the Bengali script and is predominantly spoken in West Bengal, Tripura, and Bangladesh. Known for its rich literary heritage and cultural influence, Bengali adds substantial depth to the soketlabs/bhasha-wiki dataset. Including Bengali ensures that language models are equipped to understand and generate content for millions of speakers in eastern India and across the border, contributing to cross-border language understanding.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 11
  • Views 152
  • File Size 0

Tags Tags

  • Bengali
  • indicnlp
  • natural language processing (NLP)
  • multilingual corpus
  • multi-modal language resources

License Control License Control

Attribution 4.0 International (CC BY- 4.0)