Indian Flag
Government Of India
A-
A
A+
indic_align_wikihow

indic_align_wikihow

The dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.

About Dataset

This dataset has been translated and aligned into multiple Indian languages , making it an excellent resource for training and evaluating models on instruction-following , cross-lingual transfer learning , and multilingual task understanding . This dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 4
  • Views 40
  • File Size 718.81 MB

Tags Tags

  • multilingual NLP
  • indicnlp
  • natural language processing (NLP)
  • llm
  • language research

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(718.81 MB)
  • admin·7 month(s) ago
    • text/csv
      train_subset_3.csv