
Bengali, or Bangla, uses the Bengali script and is predominantly spoken in West Bengal, Tripura, and Bangladesh.
Bengali, or Bangla, uses the Bengali script and is predominantly spoken in West Bengal, Tripura, and Bangladesh. Known for its rich literary heritage and cultural influence, Bengali adds substantial depth to the soketlabs/bhasha-wiki dataset. Including Bengali ensures that language models are equipped to understand and generate content for millions of speakers in eastern India and across the border, contributing to cross-border language understanding.
Note on Encoding:
This dataset is encoded in UTF-8 format.
Windows users:
To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.
macOS users:
You can open the CSV file directly in Excel or any spreadsheet software without any issues.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.