Indian Flag
Government Of India
A-
A
A+
English Malayalam Parallel Corpus

English Malayalam Parallel Corpus

A bilingual parallel corpus containing paired English and Malayalam sentences designed for machine translation, multilingual NLP research, and language model training.

About Dataset

This dataset is a collection of parallel text in English and Malayalam, which can be used for various applications such as machine translation, language learning, natural language processing, and language preservation. The dataset contains a sample of text from various domains, including transportation and travel. The primary objective of this dataset is to facilitate the development of machine translation models for the Malayalam language and contribute to the advancement of NLP research and applications, particularly in the context of Indian languages. This dataset was identified and facilitated for onboarding as part of the Dataset Onboarding Support Team (DOST) initiative led by by CivicDataLab (CDL), partnering with the Gates Foundation in collaboration with BHASHINI. CivicDataLab provided technical support for dataset discovery, validation, metadata preparation and onboarding facilitation. All dataset ownership and intellectual property rights remain with the original author(s).

Purpose of Dataset

The Purpose Of This Dataset Is Designed To Support The Development Of Malayalam Machine Translation Systems And Advance Natural Language Processing Research For Indian Languages. It Can Be Used For Machine Translation, Language Learning Applications, Multilingual Nlp Tasks Such As Text Classification And Sentiment Analysis, Language Modeling And Preservation Of The Malayalam Language And Cultural Heritage.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 1
  • File Size 70.28 MB
  • Views 3

Tags Tags

  • English to Malayalam translation
  • Machine translation
  • Language learning
  • Natural language processing
  • Language preservation
  • Text analysis
  • Language modeling
  • Malayalam language
  • Indian languages
  • Parallel corpus
  • Text-to-Text

License Control License Control

Database Contents License (DbCL) v1.0

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(70.28 MB)
  • Nikil Augustine·1 day(s) ago
    • text/csv
      English_Malayalam_Parallel.csv