A model fine-tuned for transforming one person's voice into another's while preserving linguistic content, based on the SpeechT5 framework.
The SpeechT5 VC model enables the conversion of speech from one voice to another, maintaining the original linguistic information. Utilizing the unified-modal SpeechT5 framework, it features a shared encoder-decoder network with modality-specific pre/post-nets tailored for speech and text. Through pre-training on large-scale unlabeled data, the model adeptly captures voice characteristics, facilitating applications like personalized voice assistants, dubbing, and voice mimicry.
MIT
Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu
Text to Speech
N.A.
Open
Sector Agnostic
12/03/25 06:34:40
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.