Indian Flag
Government Of India
A-
A
A+
bhasha-sft_aya_dataset

bhasha-sft_aya_dataset

The "bhasha-sft" dataset, particularly the "aya_dataset" subset, is designed for training and fine-tuning speech recognition models for Indic languages.

About Dataset

The "bhasha-sft" dataset, specifically the "aya_dataset" subset, is a linguistic resource intended to facilitate advancements in machine learning and natural language processing (NLP) for Indic languages. This dataset, sourced from the Bhasha SFT (Speech-to-Text) project, includes a diverse range of conversational data, providing valuable training material for speech recognition systems.


Note on Encoding:
This dataset is encoded in UTF-8 format.

  • Windows users:
    To ensure proper display of non-ASCII characters in Excel, first download the .csv file, open it in Notepad, choose File → Save As, and select UTF-8 with BOM . Then open the saved file in Excel.

  • macOS users:
    You can open the CSV file directly in Excel or any spreadsheet software without any  issues.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 11
  • Views 87
  • File Size 13.20 MB

Tags Tags

  • indicnlp
  • Indic language
  • natural language processing (NLP)
  • multilingual corpus

License Control License Control

Attribution 4.0 International (CC BY- 4.0)

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(13.20 MB)
  • admin·7 month(s) ago
    • text/csv
      train_subset.csv