This 150 M-parameter non-autoregressive speech generative model, developed by BharatGen, is designed for speaker-conditioned Text-to-speech in Telugu.
Our speaker-conditioned Text-to-speech model is a non-autoregressive speech generative model designed for Indian languages, consisting of two key components: an audio model and a duration predictor. Together, these components comprise approximately 150 million parameters. The audio model is based on continuous normalizing flows (CNFs) and transforms a simple distribution into a complex conditional audio distribution, p(missing audio∣ text), using a neural network trained with flow-matching via vector field regression. The model is trained from scratch on publicly available Indian language datasets and is optimized for speech infilling tasks such as continuous sentence completion and cross-sentence completion. Architectural modifications were made throughout to adapt the system for the diverse phonetic, rhythmic, and intonational patterns of Indian languages.
For any queries, please visit https://bharatgen.discourse.group/invites/BcouFsKk4g
MIT
BharatGen
Text-to-Speech Model
PyTorch
Restricted
Sector Agnostic
29/04/25 07:28:43
N.A
3.17 GB
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.