Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Data2vec-aqc is a self-supervised learning (SSL) model for speech representation learning, specifically designed to improve Automatic Speech Recognition (ASR) in low-resource settings. It is built using the Fairseq toolkit and extends the original data2vec framework by introducing three key modules: a quantizer (similar to wav2vec 2.0), a clustering module (from ccc-wav2vec 2.0), and a cross-contrastive loss mechanism.The model uses a Transformer-based architecture, consistent with data2vec, and operates in a teacher-student training setup. The student network processes randomly augmented versions of audio samples, while the teacher network (an exponentially moving average of the student) provides target representations. The student learns to predict the teacher’s contextualised latent representations, enabling robust feature learning.
Attribution 4.0 International (CC BY- 4.0)
SPRING LAB IITM
Fine-Tuned Model
PyTorch
Open
Science, Technology and Research
23/01/26 08:06:08
3.52 GB
Attribution 4.0 International (CC BY- 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.