Multidialectal Pradesh Odia Speech Repository MPOSR

The MPOSR is a multidialect dataset developed by Subhranshu Kumar Singh as a part of completion of Ph.D under suprvision of Prof.(Dr.) Jibanananda Mehena in DRIEMS University,under the project titled "Development of a Multidialect Odia Speech Corpus and Automatic Speech Recognition Framework for Inclusive Language Technology", under the Bhashini initiative for advancing Automatic Speech Recognition (ASR) in Indian languages.

About Dataset

To develop a standardized Odia dialectal speech corpus through structured student participation at DRIEMS University, using 83 common Odia phrases and a short narrative text to capture regional and gender variations. The corpus bridges accent and language diversity across districts of Odisha and India, enabling robust ASR and TTS systems that support accurate spoken and written communication for both regional and national communities. The dataset was initially uploaded with contributions from 10 students and has since been expanded to 17 student contributors and volunteers from multiple regions of Odisha.The MPOSR dataset includes speech samples from the East Coastal, Northern, Southern, and Western regions of Odisha. The East Coastal Region is further categorized into Middle Coastal, Northern Coastal, and Southern Coastal sub-regions.The Middle Coastal data was uploaded in the earlier phase, while the additional coastal subdivisions are incorporated in the current Version update. Acknowledgement: The authors gratefully acknowledge the mentoring, guidance, and technical support provided by Dr. Priya Ranjan, Full Professor, Department of Computer Science and Engineering, Mody University, Lakshmangarh, Sikar, Rajasthan, India, whose expertise and academic insights significantly contributed to the conceptualization and standardization of this dataset. The authors also sincerely acknowledge Dr. Subhankar Ghosal, Assistant Professor, Department of Computer Science and Engineering, Jain (Deemed to be) University, for his valuable guidance, constructive feedback, and continuous support during the dataset development and validation process. The contribution and institutional support of DRIEMS University, Cuttack, Odisha, in facilitating structured data collection, supervision, and quality assurance under an academic research framework is also duly acknowledged.

Purpose of Dataset

This Dataset Is Developed To Support Research And Development In Automatic Speech Recognition (Asr) And Speech Technology For The Odia Language, With Emphasis On Regional And Gender-based Speech Variations. Collected Through A Structured Institutional Initiative At Driems University, The Corpus Includes Standardized Odia Phrases And Short Narrative Utterances Recorded By Contributors From Multiple Districts Of Odisha. The Dataset Is Carefully Annotated With Metadata Such As District, Gender, Speaker Identifiers, And Duration To Ensure Traceability And Reproducibility. Released In A Phase-wise Manner, The Corpus Aims To Strengthen Language Technology Resources For Low-resource Indian Languages And Contribute To The Objectives Of The Digital India Bhashini Mission.