Indian Flag
Government Of India
A-
A
A+
IIIT-INDIC-HW-WORDS

IIIT-INDIC-HW-WORDS

A large-scale handwritten word dataset for 10 Indian languages, supporting OCR and multilingual handwriting recognition research

About Dataset

The IIIT-INDIC-HW-WORDS dataset is a large-scale collection of handwritten word images spanning 13 major Indian languages. It has been curated to support research in handwriting recognition, OCR (Optical Character Recognition), and multilingual document analysis.

  • Languages Covered: Bengali, Hindi, Gujarati, Kannada , Malayalam , Odiya , Punjabi (Gurmukhi) Tamil , Telugu, Urdu.

  • Content: Word-level handwritten samples contributed by diverse native speakers, ensuring variations in handwriting style, stroke patterns, and writing speed.

  • Format: Each word is stored as an image with its corresponding Unicode ground truth.

  • Scale: The dataset includes hundreds of thousands of word images, making it one of the most comprehensive handwritten corpora for Indian scripts.

  • Purpose:

    • Training and benchmarking handwriting recognition models.

    • Developing multilingual OCR systems.

    • Supporting cross-lingual and script-independent handwriting research.

  • Applications: Digital archiving, document digitization, educational tools, accessibility technologies, and AI-driven handwriting analysis.

This dataset addresses the diversity and complexity of Indian scripts (abugida structure, conjunct consonants, diacritics, and multiple zones) and serves as a valuable resource for the research community.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 0
  • Views 184
  • File Size 0

Tags Tags

  • OCR
  • Handwritting
  • Indic Languages

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(0)
  • admin·5 month(s) ago
  • No File(s) Found!