Indian Flag
Government Of India
A-
A
A+
CATALIST

CATALIST

CAmera TrAnsformations for multi-LIngual Scene Text recognition

About Dataset

Automatic text recognition in videos is challenging because ofproblems like motion blur, variations in text size, fonts, and use ofvarious languages. Movement of the capturing camera and resultingorientation of text makes the recognition task even more difficult. Attention-based methods have delivered excellent results for scenetext OCR in images. However, they suffer from the problem of attention masks becoming unstable and wandering in the scene.

In order to alleviate this issue, we offer a video dataset which contains scene-text videos along with the camera movements. These videos mainly contains sign boards and number plates. We also provide word-level masks for each video frame. The videos are shot both in indoor and outdoor environments.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 27
  • Views 342
  • File Size 36 GB

Tags Tags

  • Video
  • Text2Speech
  • Indian Language
  • Machine transaltion

License Control License Control

CC0 1.0 Public Domain

test ( 1 directories )


Directory
videos

526 files

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(36 GB)
  • admin·9 month(s) ago
    • chevron_rightFolder
      test
      • chevron_rightFolder
        videos
    • chevron_rightFolder
      train
    • chevron_rightFolder
      val