Indian Flag
Government Of India
A-
A
A+
bhasha‑wiki-Tamil

bhasha‑wiki-Tamil

Tamil, one of the world’s oldest surviving classical languages, is written in the Tamil script and is spoken widely in Tamil Nadu

About Dataset

Tamil, one of the world’s oldest surviving classical languages, is written in the Tamil script and is spoken in Tamil Nadu and parts of Sri Lanka. Its inclusion in the soketlabs/bhasha-wiki dataset enriches the dataset with content from a linguistically and culturally distinct part of India. Tamil brings a wealth of philosophical, historical, and literary content that supports the creation of inclusive and culturally aware AI systems.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 8
  • Views 102
  • File Size 0

Tags Tags

  • Tamil
  • cross-lingual NLP
  • multilingual NLP
  • indicnlp
  • natural language processing (NLP)
  • language-diversity
  • language research
  • multi-modal language resources

License Control License Control

Attribution 4.0 International (CC BY- 4.0)