Indian Flag
Government Of India
A-
A
A+

A2TTS-Bengali Speaker Adaptive TTS (Text-to-Speech)-v0.5

Text-to-speech synthesis model tailored to match a given speaker's voice sample.

About Model

We present a speaker-adaptive text-to-speech (TTS) system designed for generating high-quality, natural speech across multiple low-resource Indian languages, including Bengali, Gujarati, Hindi, Marathi, Malayalam, Punjabi, Tamil, and Telugu. Built upon a diffusion-based framework with approximately 150 million parameters, our model integrates a speaker encoder and classifier-free guidance to capture speaker-specific characteristics, enabling effective zero-shot adaptation for both seen and unseen speakers. The core architecture extends the GradTTS framework, replacing speaker tags with embeddings derived from a 10-second reference audio sample, which conditions the denoising diffusion probabilistic model (DDPM) decoder for multi-speaker synthesis. To enhance prosody, we introduce an attention-based duration predictor that leverages a reference mel spectrogram alongside text embeddings, extracting speaker-dependent prosodic features and improving the naturalness of speech timing.

For any queries, please visit https://bharatgen.discourse.group/invites/BcouFsKk4g

A2TTS-Bengali Speaker Adaptive TTS (Text-to-Speech)-v0.5

Metadata Metadata

MIT

Ayush Singh Bhadoriya, Abhishek Nikunj Shinde, Pranav Gaikwad, Prof. Ganesh Ramakrishnan

Text-to-Speech Model

PyTorch

Restricted

BharatGen

Sector Agnostic

29/04/25 10:34:36

1.62 GB

Activity Overview Activity Overview

  • Downloads1
  • Downloads 6
  • Views 2,999
  • File Size 1.62 GB

Tags Tags

  • multilingual-TTS
  • TextToSpeech
  • SpeechSynthesis

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(1.62 GB)
  • admin·9 month(s) ago
    • chevron_rightFolder
      Bengali
      • undefined
        bengali.pt
      • undefined
        hifigan.pt
      • undefined
        speaker_encoder.pt

More Models from BharatGen More Models from BharatGen

BharatGen - Param-1-7B-MoE Advancing Multilingual GenAI for India
Param-1-7B-MoE is a multilingual large language model developed under the Param-1 family as part of BharatGen – A Suite of Generative AI Technologies for India. With 7 billion parameters and a Mixture of Experts (MoE) architecture, the model is designed to better understand and generate text across English, Hindi, and 14 additional Indian languages. The model is pretrained from scratch with a strong focus on linguistic diversity, cultural context, and large-scale multilingual representation.
safetensors
mixtral
region:us
  • See Upvoters0
  • Downloads28
  • File Size0
  • Views400
Updated 28 day(s) ago

BHARATGEN

A2TTS-Malayalam Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
multilingual-TTS
TextToSpeech
  • See Upvoters0
  • Downloads6
  • File Size1.62 GB
  • Views912
Updated 6 month(s) ago

BHARATGEN

A2TTS-Gujarati Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters0
  • Downloads7
  • File Size1.62 GB
  • Views1,639
Updated 6 month(s) ago

BHARATGEN

A2TTS-Telugu Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
multilingual-TTS
TextToSpeech
  • See Upvoters0
  • Downloads14
  • File Size1.62 GB
  • Views885
Updated 6 month(s) ago

BHARATGEN

A2TTS-Tamil Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
multilingual-TTS
SpeechSynthesis
TextToSpeech
  • See Upvoters0
  • Downloads28
  • File Size1.62 GB
  • Views1,411
Updated 6 month(s) ago

BHARATGEN

A2TTS-Punjabi Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters0
  • Downloads29
  • File Size1.62 GB
  • Views841
Updated 6 month(s) ago

BHARATGEN

A2TTS-Marathi Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
SpeechSynthesis
TextToSpeech
multilingual-TTS
  • See Upvoters1
  • Downloads10
  • File Size1.62 GB
  • Views1,311
Updated 6 month(s) ago

BHARATGEN

A2TTS-Kannada Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
multilingual-TTS
SpeechSynthesis
TextToSpeech
  • See Upvoters0
  • Downloads7
  • File Size1.62 GB
  • Views984
Updated 6 month(s) ago

BHARATGEN

A2TTS-Bengali Speaker Adaptive TTS (Text-to-Speech)-v0.5
Text-to-speech synthesis model tailored to match a given speaker's voice sample.
TextToSpeech
SpeechSynthesis
multilingual-TTS
  • See Upvoters1
  • Downloads6
  • File Size1.62 GB
  • Views3,000
Updated 6 month(s) ago

BHARATGEN

BharatGen - Patram 7B Instruct
India's First Vision-Language Model for Documents
indian-documents
visual-document-understanding
BharatGen
license:apache-2.0
  • See Upvoters1
  • Downloads97
  • File Size0
  • Views1,425
Updated 7 month(s) ago

BHARATGEN