Indian Flag
Government Of India
A-
A
A+
OVA Odia Poetry Dataset

OVA Odia Poetry Dataset

The OVA Odia Poetry Dataset is a curated collection of line-level poetic text extracted from 490 Odia poetry works digitized by the Odia Virtual Academy (OVA). It represents a wide range of poetic traditions, themes, and literary styles within Odia literature. The dataset is developed to support language modeling, NLP research, generative AI training and the digital preservation of Odia poetic heritage.

About Dataset

The OVA Odia Poetry Dataset is a comprehensive, machine-readable collection of Odia poetic literature comprising 490 poetry works digitized and curated by the Odia Virtual Academy (OVA). Designed specifically to support the advancement of artificial intelligence and natural language processing for low-resource Indian languages, this dataset represents a significant step toward strengthening the digital and computational presence of Odia. The dataset captures the diversity, depth, and historical breadth of Odia poetry, reflecting a wide range of poetic traditions, literary movements, themes, and stylistic expressions. From classical and devotional verse to modern, nationalist, and experimental poetry, the corpus embodies the evolution of Odia poetic expression across time. This diversity makes the dataset uniquely valuable for building robust language models that can understand not only contemporary Odia usage but also its rich literary and cultural foundations. A defining feature of the OVA Odia Poetry Dataset is its line-level structure. Each poetic line is preserved as an independent data entry, enabling fine-grained linguistic and stylistic analysis. This structural choice is particularly important for computational modeling of poetry, as it supports the study of metre, rhyme, rhythm, syntactic variation, and semantic density at the level most natural to poetic composition. Such granularity is essential for training generative AI systems capable of producing coherent and stylistically faithful Odia verse, as well as for tasks such as poetic form recognition, automatic summarisation, and literary pattern analysis.

Purpose of Dataset

The Purpose Of The Ova Odia Poetry Dataset Is To Enable The Development Of High-quality Artificial Intelligence And Language Technologies For Odia By Providing A Structured, Machine-readable Corpus Of Poetic Literature. It Aims To Support Large Language Model Training, Nlp Research, And Ai Applications.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 0
  • Views 77
  • File Size 18.70 MB

Tags Tags

  • Odia
  • Literature Domain
  • natural language processing (NLP)
  • language-modelling
  • low-resource-languages
  • poetry

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

OVA_poetry_dataset ( 3 directories )


Directory
test

1 directories

Directory
train

1 directories

Directory
validation

1 directories

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(18.70 MB)
  • admin·4 month(s) ago
    • chevron_rightFolder
      OVA_poetry_dataset
      • chevron_rightFolder
        test
      • chevron_rightFolder
        train
      • chevron_rightFolder
        validation

Related Datasets Related Datasets

Updated 2 month(s) ago
OVA Odia Literature Dataset v1
OVA Odia Literature Dataset v1
Information
This dataset is a curated monolingual corpus of Odia literary texts prepared from books digitized by the Odia Virtual Academy (OVA). The dataset contains sentence-level extractions from multiple books processed into clean, machine learning ready text files.
Odia
Literature Domain
low-resource-languages
  • See Upvoters0
  • Downloads6
  • File Size3.04 MB
  • Views85

ODIA VIRTUAL ACADEMY, ELECTRONICS & INFORMATION TECHNOLOGY DEPARTMENT, ODISHA