Indian Flag
Government Of India
A-
A
A+
COIL-D Science and Technology v2

COIL-D Science and Technology v2

The Hindi to Indian Languages Science and Technology Translation Dataset is a parallel corpus for translating Hindi into multiple Indian languages in the science and technology domain. It includes content such as popular science explainers, technology magazines, digital service instructions, research announcements, and STEM education snippets, supporting multilingual machine translation and cross lingual NLP research.

About Dataset

Science-and-Technology_v2: Multilingual STEM Translation Science-and-Technology_v2 (SAT_v2) is a specialized parallel corpus covering popular science explainers, research announcements, and digital service instructions. Folder Structure & Quality Following the - convention, this dataset includes manually verified translations: Path: Science-and-Technology_v2 / - / source_reviewed / SAT / *.txt Quality: All files in this version are source-reviewed… See the full description on the dataset page: https://huggingface.co/datasets/coild-aikosh/Science-and-Technology_v2.

Purpose of Dataset

Nmt Training: Fine-tuning Models For Technical, Scientific, And Digital Domain Accuracy. Digital Literacy: Powering Tools That Translate Complex Technology Concepts Into Regional Indian Languages.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 0
  • Views 10
  • File Size 0

Tags Tags

  • Parallel Corpus
  • Indian Languages
  • science
  • stem
  • technology

License Control License Control

Attribution 4.0 International (CC BY- 4.0)