A model fine-tuned for transcribing spoken language into written text, utilizing the SpeechT5 framework.
The SpeechT5 ASR model excels at converting spoken words into written text. Leveraging the unified-modal SpeechT5 framework, it incorporates a shared encoder-decoder architecture with specialized pre/post-nets for processing speech and text modalities. Pre-trained on extensive unlabeled datasets, the model accurately captures and transcribes various speech patterns, making it ideal for applications such as transcription services, voice-controlled interfaces, and real-time captioning.
MIT
Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu
Text to Speech
N.A.
Open
Sector Agnostic
12/03/25 06:34:41
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.