The dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.
This dataset has been translated and aligned into multiple Indian languages , making it an excellent resource for training and evaluating models on instruction-following , cross-lingual transfer learning , and multilingual task understanding . This dataset is a multilingual parallel corpus derived from WikiHow — a large repository of step-by-step how-to guides.
Note on Encoding:
This dataset is encoded in UTF-8 format.
Windows users:
To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.
macOS users:
You can open the CSV file directly in Excel or any spreadsheet software without any issues.
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.