🚀 VITS TTS for Indian Languages
This repository houses a VITS-based Text-to-Speech (TTS) model that has been fine - tuned for Indian languages. The model supports multiple Indian languages, a wide array of speaking styles, and emotions. It is well - suited for diverse applications such as conversational AI, audiobooks, and more.
🚀 Quick Start
This VITS-based TTS model is designed to convert text into speech for multiple Indian languages. It offers various speaking styles and emotions, enhancing the user experience in different scenarios.
✨ Features
The model ai4bharat/vits_rasa_13
is based on the VITS architecture and comes with the following features:
- Languages: Supports multiple Indian languages.
- Styles: Offers various speaking styles and emotions.
- Speaker IDs: Has predefined speaker profiles for male and female voices.
📦 Installation
pip install transformers torch
💻 Usage Examples
Basic Usage
import soundfile as sf
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)
text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"
speaker_id = 16
style_id = 0
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)
📚 Documentation
Supported Languages
Assamese
Bengali
Bodo
Dogri
Kannada
Maithili
Malayalam
Marathi
Nepali
Punjabi
Sanskrit
Tamil
Telugu
Speaker - Style Identifier Overview
Speaker Name |
Speaker ID |
ASM_F |
0 |
ASM_M |
1 |
BEN_F |
2 |
BEN_M |
3 |
BRX_F |
4 |
BRX_M |
5 |
DOI_F |
6 |
DOI_M |
7 |
KAN_F |
8 |
KAN_M |
9 |
MAI_M |
10 |
MAL_F |
11 |
MAR_F |
12 |
MAR_M |
13 |
NEP_F |
14 |
PAN_F |
15 |
PAN_M |
16 |
SAN_M |
17 |
TAM_F |
18 |
TEL_F |
19 |
Style Name |
Style ID |
ALEXA |
0 |
ANGER |
1 |
BB |
2 |
BOOK |
3 |
CONV |
4 |
DIGI |
5 |
DISGUST |
6 |
FEAR |
7 |
HAPPY |
8 |
NEWS |
10 |
SAD |
12 |
SURPRISE |
14 |
UMANG |
15 |
WIKI |
16 |
📄 License
This project is licensed under the CC - BY - 4.0
license.
📚 Citation
If you use this model in your research, please cite:
@article{ai4bharat_vits_rasa_13,
title={VITS TTS for Indian Languages},
author={Ashwin Sankar},
year={2024},
publisher={Hugging Face}
}