
Tamil, one of the world’s oldest surviving classical languages, is written in the Tamil script and is spoken widely in Tamil Nadu
Tamil, one of the world’s oldest surviving classical languages, is written in the Tamil script and is spoken in Tamil Nadu and parts of Sri Lanka. Its inclusion in the soketlabs/bhasha-wiki dataset enriches the dataset with content from a linguistically and culturally distinct part of India. Tamil brings a wealth of philosophical, historical, and literary content that supports the creation of inclusive and culturally aware AI systems.
Note on Encoding:
This dataset is encoded in UTF-8 format.
Windows users:
To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.
macOS users:
You can open the CSV file directly in Excel or any spreadsheet software without any issues.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.