A high-fidelity vocoder designed to convert spectrograms into waveforms, complementing the SpeechT5 Text To Speech and VC models.
The SpeechT5 HiFi-GAN Vocoder is engineered to transform spectrograms—visual representations of sound—into high-quality audio waveforms. Serving as a post-processing component for the SpeechT5 Text to speech and VC models, it ensures the generated speech sounds natural and clear. By employing advanced generative adversarial network techniques, the vocoder enhances the realism of synthetic speech, making it suitable for applications requiring high-fidelity audio output.
MIT
Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu
Text to Speech
N.A.
Open
Sector Agnostic
12/03/25 06:34:42
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.