ORGANISATION

OVA Odia Poetry Dataset

The OVA Odia Poetry Dataset is a curated collection of line-level poetic text extracted from 490 Odia poetry works digitized by the Odia Virtual Academy (OVA). It represents a wide range of poetic traditions, themes, and literary styles within Odia literature. The dataset is developed to support language modeling, NLP research, generative AI training and the digital preservation of Odia poetic heritage.

About Dataset

The OVA Odia Poetry Dataset is a comprehensive, machine-readable collection of Odia poetic literature comprising 490 poetry works digitized and curated by the Odia Virtual Academy (OVA). Designed specifically to support the advancement of artificial intelligence and natural language processing for low-resource Indian languages, this dataset represents a significant step toward strengthening the digital and computational presence of Odia. The dataset captures the diversity, depth, and historical breadth of Odia poetry, reflecting a wide range of poetic traditions, literary movements, themes, and stylistic expressions. From classical and devotional verse to modern, nationalist, and experimental poetry, the corpus embodies the evolution of Odia poetic expression across time. This diversity makes the dataset uniquely valuable for building robust language models that can understand not only contemporary Odia usage but also its rich literary and cultural foundations. A defining feature of the OVA Odia Poetry Dataset is its line-level structure. Each poetic line is preserved as an independent data entry, enabling fine-grained linguistic and stylistic analysis. This structural choice is particularly important for computational modeling of poetry, as it supports the study of metre, rhyme, rhythm, syntactic variation, and semantic density at the level most natural to poetic composition. Such granularity is essential for training generative AI systems capable of producing coherent and stylistically faithful Odia verse, as well as for tasks such as poetic form recognition, automatic summarisation, and literary pattern analysis.

Purpose of Dataset

The Purpose Of The Ova Odia Poetry Dataset Is To Enable The Development Of High-quality Artificial Intelligence And Language Technologies For Odia By Providing A Structured, Machine-readable Corpus Of Poetic Literature. It Aims To Support Large Language Model Training, Nlp Research, And Ai Applications.

Dataset Metadata

License

Attribution 4.0 International (CC BY- 4.0)

Geographical coverage

Odisha, India

Sector

Sector Agnostic

Author

Smruti Ranjan Mishra

Source Organisation

Odia Virtual Academy, Electronics & Information Technology Department, Odisha

Uploaded by

Smruti Ranjan Mishra

Data Quality Score (Beta)

4.82

Dataset type

Unstructured

Frequency

Static

Time Granularity

Year range

04/12/1930 - 04/12/2025

Date & Time

04/12/25 10:06:26

Visibility

Open

Hosted / Redirected

Redirected

Data Type

Secondary

If Redirection which source

Https://ova.gov.in/

Activity Overview

0
1
18.70 MB
165

License Control

Attribution 4.0 International (CC BY- 4.0)

OVA_poetry_dataset ( 3 directories )

test

1 directories

train

1 directories

validation

1 directories

Data Quality Score Beta

Version Control

Version 1(18.70 MB)

admin·7 month(s) ago
- OVA_poetry_dataset
  test
  train
  validation

Related Datasets

Updated 5 month(s) ago

OVA Odia Literature Dataset v1

This dataset is a curated monolingual corpus of Odia literary texts prepared from books digitized by the Odia Virtual Academy (OVA). The dataset contains sentence-level extractions from multiple books processed into clean, machine learning ready text files.

Odia

Literature Domain

low-resource-languages

0
8
3.04 MB
130

ODIA VIRTUAL ACADEMY, ELECTRONICS & INFORMATION TECHNOLOGY DEPARTMENT, ODISHA

View Details

Accessibility options by UX4G

OVA Odia Poetry Dataset

About Dataset

Purpose of Dataset

Dataset Metadata

Activity Overview

Tags

License Control

OVA_poetry_dataset ( 3 directories )

test

train

validation

Data Quality Score Beta

Version Control

Version 1(18.70 MB)

OVA_poetry_dataset

test

train

validation

Related Datasets

AIKosh

Resources

Support