
Gujarati is written in the Gujarati script and is widely spoken in the western Indian state of Gujarat
Gujarati is written in the Gujarati script and is widely spoken in the western Indian state of Gujarat, as well as among global diaspora communities. Its presence in the soketlabs/bhasha-wiki dataset supports the development of AI systems that understand regional business, trade, and cultural narratives. The language’s economic significance and literary contributions make it essential for comprehensive multilingual model training.
Note on Encoding:
This dataset is encoded in UTF-8 format.
Windows users:
To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.
macOS users:
You can open the CSV file directly in Excel or any spreadsheet software without any issues.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.