Indian Flag
Government Of India
A-
A
A+

Bhashini - Fastspeech2 Model using (HS)

Text-to-speech models trained using FastPitch and HiFi-GAN vocoder, separately for each language. Supports both 'female' and 'male' voices.

About Model

This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech. 
Fs2 is composed of 6 feed-forward Transformer blocks with multi-head self-attention and 1D convolution on both phoneme encoder and mel-spectrogram decoder. In each feed-forward Transformer, the hidden size of multi-head attention is set to 256 and the number of head is set to 2. The kernel size of 1D convolution in the two-layer convolution network is set to 9 and 1, and the input/output size of the number of channels in the first and the second layer is 256/1024 and 1024/256. The duration predictor and variance adaptor, which are composed of stacks of several convolution networks and the final linear projection layer. The convolution layers of the duration predictor and variance adaptor are set to 2 and 5, the kernel size is set to 3, the input/output size of all layers is 256/256, and the dropout rate is set to 0.5.

Bhashini - Fastspeech2 Model using (HS)

Metadata Metadata

MIT

SMT Lab IIT Madras

Speech Synthesis (TTS) Model

Other

Open

Sector Agnostic

01/05/25 06:47:30

286.72 MB

Activity Overview Activity Overview

  • Downloads0
  • Downloads 64
  • Views 932
  • File Size 286.72 MB

Tags Tags

  • Multilingual
  • NLP
  • Text Processing
  • Transformer
  • Text to Speech
  • Language Detection

License Control License Control

MIT

Version Control Version Control

FolderVersion 2(286.72 MB)
  • admin·1 year(s) ago
    • chevron_rightFolder
      assamese
      • chevron_rightFolder
        female
      • chevron_rightFolder
        male
    • chevron_rightFolder
      bengali
    • chevron_rightFolder
      bodo
    • chevron_rightFolder
      charmap
    • chevron_rightFolder
      english
    • undefined
      .gitattributes
    • undefined
      api.py
    • undefined
      app.py
    • undefined
      environment.yml
    • undefined
      get_phone_mapped_python.py
    • more_horiz 25 more

More Models from Digital India BHASHINI Division More Models from Digital India BHASHINI Division

SPRING-INX-DATA2VEC-AQC-BENGALI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
spring_lab
Data2vec_aqc
ssl
low-resource-languages
SSL_finetunning
bengali
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views55
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BODO
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Data2vec_aqc
ssl
IITM
spring_lab
SSL_finetunning
low-resource-language
BODO
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views62
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-BHOJPURI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
SSL_finetunning
Bhojpuri
ssl
IITM
spring_lab
Data2vec_aqc
low-resource-language
  • See Upvoters0
  • Downloads1
  • File Size3.52 GB
  • Views57
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MALAYALAM
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
malayalam
IITM
spring_lab
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views44
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-KANNADA
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
IITM
low-resource-language
SSL_finetunning
Data2vec_aqc
kannada
spring_lab
ssl
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views48
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-MARATHI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
Marathi
low-resource-language
SSL_finetunning
Data2vec_aqc
spring_lab
IITM
ssl
  • See Upvoters0
  • Downloads1
  • File Size3.52 GB
  • Views54
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-SANSKRIT
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
IITM
spring_lab
Sanskrit
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads1
  • File Size3.52 GB
  • Views50
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-PUNJABI
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
low-resource-language
ssl
IITM
spring_lab
PUNJABI
Data2vec_aqc
SSL_finetunning
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views42
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING-INX-DATA2VEC-AQC-ODIA
Automatic Speech Recognition (ASR) model for speech recognition, processing audio and transcribing spoken content into text.
spring_lab
Odia
ssl
IITM
Data2vec_aqc
SSL_finetunning
low-resource-language
  • See Upvoters0
  • Downloads0
  • File Size3.52 GB
  • Views45
Updated 23 day(s) ago

DIGITAL INDIA BHASHINI DIVISION

SPRING LAB TAMIL-STREAMING
Automatic Speech Recognition (ASR) model for Tamil speech recognition, processing audio and transcribing spoken content into text.
Icefall-K2
ASR
tamil
IITM
spring_lab
streaming
MODELS
zipformer
  • See Upvoters0
  • Downloads8
  • File Size260.42 MB
  • Views179
Updated 1 month(s) ago

DIGITAL INDIA BHASHINI DIVISION