Indian Flag
Government Of India
A-
A
A+

spk cond tts pflow Hindi

This 150 M-parameter non-autoregressive speech generative model, developed by BharatGen, is designed for speaker-conditioned Text-to-speech in Hindi.

About Model

Our speaker-conditioned Text-to-speech model is a non-autoregressive speech generative model designed for Indian languages, consisting of two key components: an audio model and an enhanced duration predictor. Together, these components comprise approximately 150 million parameters. The audio model is based on continuous normalizing flows (CNFs) and transforms a simple distribution into a complex conditional audio distribution, p(missing audio, speaker audio, text), using a neural network trained with flow-matching via vector field regression. To better handle the prosodic richness of Indian languages, we extend the standard duration predictor architecture. Unlike Voicebox, which uses only text and durations, our model incorporates a 3-second speaker prompt along with the text. This enables the duration predictor to extract speaker-specific prosodic cues from the reference audio, resulting in more accurate and natural duration estimates.
The model is trained from scratch on publicly available Indian language datasets and is optimized for speech infilling tasks such as continuous sentence completion and cross-sentence completion. Architectural modifications were made throughout to adapt the system for the diverse phonetic, rhythmic, and intonational patterns of Indian languages.
 For any queries, please visit https://bharatgen.discourse.group/invites/BcouFsKk4g

spk cond tts pflow Hindi

Metadata Metadata

MIT

BharatGen

Text-to-Speech Model

PyTorch

Restricted

BharatGen

Sector Agnostic

29/04/25 06:54:27

N.A

3.46 GB

Activity Overview Activity Overview

  • Downloads0
  • Downloads 1
  • Views 60
  • File Size 3.46 GB

Tags Tags

  • TextToSpeech

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(3.46 GB)
  • admin·9 month(s) ago
    • chevron_rightFolder
      Hindi
      • chevron_rightFolder
        pipeline2_v2_api
      • undefined
        checkpoint_1930.pt
      • undefined
        checkpoint_20200.pt
      • undefined
        infer.sh
      • text/plain
        readme.txt
      • text/plain
        requirements.txt

More Models from null More Models from null

BharatGen - Param-1-7B-MoE Advancing Multilingual GenAI for India
Param-1-7B-MoE is a multilingual large language model developed under the Param-1 family as part of BharatGen – A Suite of Generative AI Technologies for India. With 7 billion parameters and a Mixture of Experts (MoE) architecture, the model is designed to better understand and generate text across English, Hindi, and 14 additional Indian languages. The model is pretrained from scratch with a strong focus on linguistic diversity, cultural context, and large-scale multilingual representation.
safetensors
mixtral
region:us
  • See Upvoters0
  • Downloads50
  • File Size0
  • Views644
Updated 1 month(s) ago

BHARATGEN

A2TTS-Malayalam Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
multilingual-TTS
TextToSpeech
  • See Upvoters0
  • Downloads6
  • File Size1.62 GB
  • Views1,111
Updated 7 month(s) ago

BHARATGEN

A2TTS-Gujarati Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters0
  • Downloads7
  • File Size1.62 GB
  • Views2,208
Updated 7 month(s) ago

BHARATGEN

A2TTS-Telugu Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
multilingual-TTS
TextToSpeech
  • See Upvoters0
  • Downloads14
  • File Size1.62 GB
  • Views1,178
Updated 7 month(s) ago

BHARATGEN

A2TTS-Tamil Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
multilingual-TTS
SpeechSynthesis
TextToSpeech
  • See Upvoters0
  • Downloads28
  • File Size1.62 GB
  • Views1,809
Updated 7 month(s) ago

BHARATGEN

A2TTS-Punjabi Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters0
  • Downloads29
  • File Size1.62 GB
  • Views1,110
Updated 7 month(s) ago

BHARATGEN

A2TTS-Marathi Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters2
  • Downloads10
  • File Size1.62 GB
  • Views1,952
Updated 7 month(s) ago

BHARATGEN

A2TTS-Kannada Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
multilingual-TTS
SpeechSynthesis
TextToSpeech
  • See Upvoters0
  • Downloads7
  • File Size1.62 GB
  • Views1,273
Updated 7 month(s) ago

BHARATGEN

A2TTS-Bengali Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
TextToSpeech
SpeechSynthesis
multilingual-TTS
  • See Upvoters1
  • Downloads6
  • File Size1.62 GB
  • Views3,615
Updated 7 month(s) ago

BHARATGEN

BharatGen - Patram 7B Instruct
India's First Vision-Language Model for Documents
indian-documents
visual-document-understanding
BharatGen
license:apache-2.0
  • See Upvoters1
  • Downloads127
  • File Size0
  • Views2,276
Updated 8 month(s) ago

BHARATGEN