Home/Models/BharatGen - A2TTS-v0.5 : Speaker Adaptive TTS Model (Hindi)

BharatGen - A2TTS-v0.5 : Speaker Adaptive TTS Model (Hindi)

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

BharatGen
KUNDESHWAR

About Model

We present a speaker-adaptive text-to-speech (TTS) system designed for generating high-quality, natural speech across multiple low-resource Indian languages, including Bengali, Gujarati, Hindi, Marathi, Malayalam, Punjabi, Tamil, and Telugu. Built upon a diffusion-based framework with approximately 150 million parameters, our model integrates a speaker encoder and classifier-free guidance to capture speaker-specific characteristics, enabling effective zero-shot adaptation for both seen and unseen speakers. The core architecture extends the GradTTS framework, replacing speaker tags with embeddings derived from a 10-second reference audio sample, which conditions the denoising diffusion probabilistic model (DDPM) decoder for multi-speaker synthesis. To enhance prosody, we introduce an attention-based duration predictor that leverages a reference mel spectrogram alongside text embeddings, extracting speaker-dependent prosodic features and improving the naturalness of speech timing.

For any queries, please visit https://bharatgen.discourse.group/invites/BcouFsKk4g

BharatGen - A2TTS-v0.5 : Speaker Adaptive TTS Model (Hindi)

Metadata

License

MIT

Hosted By

Ayush Singh Bhadoriya, Abhishek Nikunj Shinde, Pranav Gaikwad, Prof. Ganesh Ramakrishnan

Model Type

Text-to-Speech Model

Model Format

PyTorch

Visibility

Open

Source organisation

BharatGen

Sector

Sector Agnostic

Updated Date & Time

22/05/25 11:31:41

Created By

Abhay Vijayvargiya

Size

1.62 GB

Activity Overview

2
136
1,850
1.62 GB

License Control

MIT

Version Control

Version 1(1.62 GB)

admin·9 month(s) ago
- Hindi
  hifigan.pt
  hindi.pt
  speaker_encoder.pt

More Models from BharatGen

BharatGen - Param-1-7B-MoE Advancing Multilingual GenAI for India

Param-1-7B-MoE is a multilingual large language model developed under the Param-1 family as part of BharatGen – A Suite of Generative AI Technologies for India. With 7 billion parameters and a Mixture of Experts (MoE) architecture, the model is designed to better understand and generate text across English, Hindi, and 14 additional Indian languages. The model is pretrained from scratch with a strong focus on linguistic diversity, cultural context, and large-scale multilingual representation.

safetensors

mixtral

region:us

Updated 27 day(s) ago

BHARATGEN

View Details

A2TTS-Malayalam Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

SpeechSynthesis

multilingual-TTS

TextToSpeech

0
6
1.62 GB
912

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Gujarati Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

SpeechSynthesis

TextToSpeech

multilingual-TTS

0
7
1.62 GB
1,636

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Telugu Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

SpeechSynthesis

multilingual-TTS

TextToSpeech

0
14
1.62 GB
885

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Tamil Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

multilingual-TTS

SpeechSynthesis

TextToSpeech

0
28
1.62 GB
1,411

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Punjabi Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

SpeechSynthesis

TextToSpeech

multilingual-TTS

0
29
1.62 GB
839

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Marathi Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

SpeechSynthesis

TextToSpeech

multilingual-TTS

1
10
1.62 GB
1,309

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Kannada Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

multilingual-TTS

SpeechSynthesis

TextToSpeech

0
7
1.62 GB
983

Updated 6 month(s) ago

BHARATGEN

View Details

A2TTS-Bengali Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

TextToSpeech

SpeechSynthesis

multilingual-TTS

1
6
1.62 GB
2,991

Updated 6 month(s) ago

BHARATGEN

View Details

BharatGen - Patram 7B Instruct

India's First Vision-Language Model for Documents

indian-documents

visual-document-understanding

BharatGen

license:apache-2.0

1
97
0
1,425

Updated 7 month(s) ago

BHARATGEN

View Details

Accessibility options by UX4G

BharatGen - A2TTS-v0.5 : Speaker Adaptive TTS Model (Hindi)

About Model

BharatGen - A2TTS-v0.5 : Speaker Adaptive TTS Model (Hindi)

Metadata

Activity Overview

Tags

License Control

Version Control

Version 1(1.62 GB)

Hindi

hifigan.pt

hindi.pt

speaker_encoder.pt

More Models from BharatGen

AIKosh

Resources

Support