ORGANISATION

indic_align_wikihow

The dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.

About Dataset

This dataset has been translated and aligned into multiple Indian languages , making it an excellent resource for training and evaluating models on instruction-following , cross-lingual transfer learning , and multilingual task understanding . This dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.

Note on Encoding:
This dataset is encoded in UTF-8 format.

Windows users:
To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.
macOS users:
You can open the CSV file directly in Excel or any spreadsheet software without any issues.