A model fine-tuned for converting text into natural-sounding speech, leveraging the SpeechT5 framework.
The SpeechT5 Text to Speech model is designed to transform written text into spoken words. Built upon the unified-modal SpeechT5 framework, it employs a shared encoder-decoder network with modality-specific pre/post-nets to handle both speech and text inputs. Pre-trained on large-scale unlabeled data, the model effectively captures the nuances of human speech, making it suitable for applications like virtual assistants, audiobooks, and accessibility tools.
MIT
Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu
Text to Speech
N.A.
Open
Sector Agnostic
12/03/25 06:34:39
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.