Indian Flag
Government Of India
A-
A
A+
OVA Odia Literature Dataset v1

OVA Odia Literature Dataset v1

This dataset is a curated monolingual corpus of Odia literary texts prepared from books digitized by the Odia Virtual Academy (OVA). The dataset contains sentence-level extractions from multiple books processed into clean, machine learning ready text files.

About Dataset

This dataset is a curated monolingual corpus of Odia literary texts derived from books digitized by the Odia Virtual Academy (OVA). Each digitized book has been processed into a clean, UTF-8 encoded, machine-learning-ready CSV file in which every row represents a single sentence. Sentences are extracted using Odia-specific punctuation rules, primarily split by the Odia danda (।) and the question mark (?), ensuring linguistically consistent segmentation. By compiling diverse literary works into a uniform, sentence-level structure, this UTF-8–encoded dataset provides a robust foundation for tasks such as language modeling, translation, text classification, and broader computational studies of the Odia language.

Purpose of Dataset

The Primary Purpose Of This Dataset Is To Support The Development And Evaluation Of Natural Language Processing Tools And Linguistic Research For The Odia Language, Enabling Tasks Such As Language Modeling, Machine Translation, Text Classification.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 5
  • Views 50
  • File Size 3.04 MB

Tags Tags

  • Odia
  • Literature Domain
  • low-resource-languages

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

OVA_dataset_v1 ( 3 directories )


Directory
test

1 directories

Directory
train

1 directories

Directory
validation

1 directories

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(3.04 MB)
  • admin·3 month(s) ago
    • chevron_rightFolder
      OVA_dataset_v1
      • chevron_rightFolder
        test
      • chevron_rightFolder
        train
      • chevron_rightFolder
        validation