Indian Flag
Government Of India
A-
A
A+
Mustard Dataset (Table Structure Recognition)

Mustard Dataset (Table Structure Recognition)

MUSTARD (Multilingual Scanned and Scene Table Structure Recognition Dataset)

About Dataset

MUSTARD (Multilingual Scanned and Scene Table Structure Recognition Dataset) is a diverse dataset curated for table structure recognition across multiple languages. The dataset consists of tables extracted from magazines, including printed, scanned, and scene-text tables, labeled with Optimized Table Structure Language (OTSL) sequences. It is designed to facilitate research in multilingual table structure recognition, particularly for non-English documents.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 6
  • Views 84
  • File Size 530.67 MB

Tags Tags

  • AI4Bharat
  • Sanskrit
  • Digital India
  • multilingual NLP
  • IndiaAI
  • IITB
  • indicnlp
  • Indian Language
  • linguistic diversity
  • IITBombay
  • AIkosha
  • DataforAI
  • IITBImpact
  • BharatGen

License Control License Control

CC0 1.0 Public Domain

MUSTARD_Dataset ( 1 files, 2 directories )


Directory
indic

3 files, 12 directories

Directory
scenetext

1 files, 4 directories

text/plain
merged.txt

109.14 KB

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(530.67 MB)
  • admin·9 month(s) ago
    • chevron_rightFolder
      MUSTARD_Dataset
      • chevron_rightFolder
        indic
      • chevron_rightFolder
        scenetext
      • text/plain
        merged.txt