Indian Flag
Government Of India
A-
A
A+
OVA Odia Prose Literature Dataset

OVA Odia Prose Literature Dataset

The OVA Odia Prose Literature Dataset is a curated collection of sentence-level text extracted from 1,143 Odia books digitized by the Odia Virtual Academy (OVA). It spans multiple domains including prose, culture, autobiographies, biography, travel writing, plays, criticism, short story collections, essays, religion and philosophy, scientific writing, and history. The dataset is developed to support language modelling, NLP research and generative AI training.

About Dataset

The OVA Odia Prose Literature Dataset is a curated compilation of sentence-level text drawn from 1,143 Odia books digitized by the Odia Virtual Academy (OVA). It has been developed to serve as a structured, machine-learning-ready resource for natural language processing, linguistic research, and generative AI development in Odia. The dataset covers a broad range of domains, including prose, culture, autobiographies, biography, travel writing, plays, criticism, short story collections, essays, religion and philosophy, scientific writing, and history, providing a wide representation of Odia’s literary and intellectual traditions. The dataset brings together works spanning different periods and writing styles, offering a diverse view of Odia language usage. By extracting content at the sentence level, the dataset aligns with the requirements of modern NLP models that benefit from clean and consistent input units. This structure enables direct use in tasks such as language modeling, translation, summarization, and text generation, as well as analytical tasks that require segmented and standardized text. The variety of source domains contributes to the richness of linguistic patterns within the dataset. It reflects narrative writing, analytical exposition, reflective prose, conversational text, descriptive passages, historical narration, and technical explanation. This mixture helps models and researchers access a more complete picture of Odia as it appears across literature, scholarship, personal writing, and documentation. The presence of texts from different genres allows the dataset to capture differences in vocabulary, tone, sentence construction, and stylistic form, which is important for building AI systems designed to handle real-world usage rather than narrow subsets of the language.

Purpose of Dataset

The Purpose Of This Dataset Is To Provide Sentence-level Odia Text, Extracted From Digitized Books And Segmented Using Odia Danda And Question-mark Delimiters, To Support Ai Training. By Transforming Each Book Into A Clean Csv File With Individual Sentences As Rows, The Dataset Enables Language Modeling, Text Processing, And Other Nlp Tasks That Require Structured, High-quality Odia Textual Data.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 2
  • Views 67
  • File Size 107.84 MB

Tags Tags

  • Odia
  • Literature Domain
  • monolingual corpora
  • language-modelling
  • low-resource-languages
  • prose
  • textual data
  • nlp

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

OVA_prose_dataset ( 3 directories )


Directory
test

1 directories

Directory
train

1 directories

Directory
validation

1 directories

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(107.84 MB)
  • admin·4 month(s) ago
    • chevron_rightFolder
      OVA_prose_dataset
      • chevron_rightFolder
        test
      • chevron_rightFolder
        train
      • chevron_rightFolder
        validation

Related Datasets Related Datasets

Updated 2 month(s) ago
OVA Odia Literature Dataset v1
OVA Odia Literature Dataset v1
Information
This dataset is a curated monolingual corpus of Odia literary texts prepared from books digitized by the Odia Virtual Academy (OVA). The dataset contains sentence-level extractions from multiple books processed into clean, machine learning ready text files.
Odia
Literature Domain
low-resource-languages
  • See Upvoters0
  • Downloads6
  • File Size3.04 MB
  • Views85

ODIA VIRTUAL ACADEMY, ELECTRONICS & INFORMATION TECHNOLOGY DEPARTMENT, ODISHA