Vits_rasa_13 Open-source Text-to-Speech Model - Free Support for 13 Indian Languages and Diverse Emotional Expressions

Home

Vits Rasa 13

Developed by ai4bharat

A VITS-based text-to-speech model supporting 13 Indian languages with diverse speaking styles and emotional expressions

Speech Synthesis

Transformers

Other#Multilingual Indian TTS #Emotional Speech Synthesis #Conversational AI Optimization

Downloads 462

Release Time : 12/31/2024

Model Overview

This model is specifically designed for Indian languages, supporting multiple languages and emotional styles, suitable for conversational AI, audiobooks, and other applications

Model Features

Multilingual Support

Supports speech synthesis for 13 Indian languages

Emotional Expression

Provides 16 different speaking styles and emotional expressions

Diverse Voice Profiles

Includes 20 predefined male and female voice configurations

Model Capabilities

Text-to-Speech

Multilingual Speech Synthesis

Emotional Speech Generation

Speaker Style Control

Use Cases

Conversational AI

Smart Voice Assistants

Provides localized voice interaction experiences for users in India

Audiobook Content Creation

Multilingual Audiobooks

Generates audiobook content in various Indian languages

🚀 VITS TTS for Indian Languages

This repository houses a VITS-based Text-to-Speech (TTS) model that has been fine - tuned for Indian languages. The model supports multiple Indian languages, a wide array of speaking styles, and emotions. It is well - suited for diverse applications such as conversational AI, audiobooks, and more.

🚀 Quick Start

This VITS-based TTS model is designed to convert text into speech for multiple Indian languages. It offers various speaking styles and emotions, enhancing the user experience in different scenarios.

✨ Features

The model ai4bharat/vits_rasa_13 is based on the VITS architecture and comes with the following features:

Languages: Supports multiple Indian languages.
Styles: Offers various speaking styles and emotions.
Speaker IDs: Has predefined speaker profiles for male and female voices.

📦 Installation

pip install transformers torch

💻 Usage Examples

Basic Usage

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"  # Example text in Punjabi
speaker_id = 16  # PAN_M
style_id = 0  # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)

📚 Documentation

Supported Languages

Assamese
Bengali
Bodo
Dogri
Kannada
Maithili
Malayalam
Marathi
Nepali
Punjabi
Sanskrit
Tamil
Telugu

Speaker - Style Identifier Overview

Speaker Name	Speaker ID
ASM_F	0
ASM_M	1
BEN_F	2
BEN_M	3
BRX_F	4
BRX_M	5
DOI_F	6
DOI_M	7
KAN_F	8
KAN_M	9
MAI_M	10
MAL_F	11
MAR_F	12
MAR_M	13
NEP_F	14
PAN_F	15
PAN_M	16
SAN_M	17
TAM_F	18
TEL_F	19

Style Name	Style ID
ALEXA	0
ANGER	1
BB	2
BOOK	3
CONV	4
DIGI	5
DISGUST	6
FEAR	7
HAPPY	8
NEWS	10
SAD	12
SURPRISE	14
UMANG	15
WIKI	16

📄 License

This project is licensed under the CC - BY - 4.0 license.

📚 Citation

If you use this model in your research, please cite:

@article{ai4bharat_vits_rasa_13,
  title={VITS TTS for Indian Languages},
  author={Ashwin Sankar},
  year={2024},
  publisher={Hugging Face}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご