Home/Datasets/Aksharantar

Nikhil Narasimhan

ORGANISATION

Aksharantar

Aksharantar is the largest publicly available transliteration dataset for 20 Indic languages

About Dataset

Dataset Summary

Aksharantar is the largest publicly available transliteration dataset for 20 Indic languages. The corpus has 26M Indic language-English transliteration pairs.

Languages


Assamese (asm)	Hindi (hin)	Maithili (mai)	Marathi (mar)	Punjabi (pan)	Tamil (tam)
Bengali (ben)	Kannada (kan)	Malayalam (mal)	Nepali (nep)	Sanskrit (san)	Telugu (tel)
Bodo(brx)	Kashmiri (kas)	Manipuri (mni)	Oriya (ori)	Sindhi (snd)	Urdu (urd)
Gujarati (guj)	Konkani (kok)	Dogri (doi)

Dataset Metadata

License

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Geographical coverage

National

Sector

Sector Agnostic

Author

Yash Madhani and Sushane Parthan and Priyanka Bedekar and Ruchi Khapra and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra

Source Organisation

Uploaded by

Nikhil Narasimhan

Data Quality Score (Beta)

-

Dataset type

Structured

Frequency

Static

Time Granularity

NA

Year range

N.A.

Date & Time

04/08/25 09:56:34

Visibility

Open

Hosted / Redirected

Hosted

Activity Overview

1
150
0
1,048

Tags

transliteration
multilingual corpus
Indic Languages

License Control

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

© 2026 - Copyright AIKosh. All rights reserved.