XTTS-v2-Urdu-FT Open-Source TTS Model - Free Urdu Text-to-Speech and Voice Cloning

XTTS V2 Urdu FT

Developed by suhaibrashid17

A TTS model supporting Urdu text-to-speech and voice cloning

Speech Synthesis Open Source License:MIT #Urdu TTS #Voice Cloning #Multilingual Support

Downloads 70

Release Time : 12/11/2024

Model Overview

This model can convert Urdu text into natural speech and supports voice cloning through reference audio to generate speech with a similar timbre to the reference.

Model Features

Urdu Language Support

Speech synthesis capabilities specifically optimized for Urdu

Voice Cloning

Can clone speaker timbre through reference audio

High-Quality Synthesis

Generates high-quality, natural-sounding speech

Model Capabilities

Text-to-Speech

Voice Cloning

Multi-Speaker Speech Synthesis

Use Cases

Voice Applications

Audiobook Generation

Convert Urdu text into audiobooks

Natural and fluent speech output

Voice Assistants

Provide speech synthesis capabilities for Urdu voice assistants

Customizable voice responses with different timbres

Voice Cloning Service

Clone specific speaker's voice style

Synthesized speech that retains the original speaker's timbre characteristics

🚀 Urdu TTS Model

This is an Urdu Text-to-Speech (TTS) model that supports voice cloning. It can convert Urdu text into speech with a cloned voice.

🚀 Quick Start

This section provides a guide on how to use the Urdu TTS model, including installation steps, usage examples, and important notes.

📦 Installation

Install the coqui-tts library using pip:

pip install coqui-tts

Locate the TTS/tts/layers/xtts/tokenizers.py file in your site-packages directory.
Replace the tokenizers.py file with the one from this repository.
You're all set!

💻 Usage Examples

Source Voice

You can listen to the source voice here:

Generated Voice

Here are the generated voices:

Inference Code

import torch
import torchaudio
from tqdm import tqdm
from underthesea import sent_tokenize
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

device = "cuda:0" if torch.cuda.is_available() else "cpu"
xtts_checkpoint = "model.pth"
xtts_config = "config.json"
xtts_vocab = "vocab.json"


config = XttsConfig()
config.load_json(xtts_config)
XTTS_MODEL = Xtts.init_from_config(config)
XTTS_MODEL.load_checkpoint(config, checkpoint_path=xtts_checkpoint, vocab_path=xtts_vocab, use_deepspeed=False)
XTTS_MODEL.to(device)

print("Model loaded successfully!")

# In case you are cloning from WhatsApp voice notes:
from pydub import AudioSegment

audio = AudioSegment.from_file("input-4.ogg", format="ogg")
audio.export("output.wav", format="wav")
print("Conversion complete!")

# Inference
tts_text = f"""یہ ٹی ٹی ایس کیسا ہے؟ اس کے بارے میں کچھ بتائیں"""
speaker_audio_file = "output.wav"
lang = "ur"

gpt_cond_latent, speaker_embedding = XTTS_MODEL.get_conditioning_latents(
    audio_path=["output.wav"],
    gpt_cond_len=XTTS_MODEL.config.gpt_cond_len,
    max_ref_length=XTTS_MODEL.config.max_ref_len,
    sound_norm_refs=XTTS_MODEL.config.sound_norm_refs,
)

tts_texts = [tts_text]
wav_chunks = []
for text in tqdm(tts_texts):
    wav_chunk = XTTS_MODEL.inference(
        text=text,
        language=lang,
        gpt_cond_latent=gpt_cond_latent,
        speaker_embedding=speaker_embedding,
        temperature=0.1,
        length_penalty=0.1,
        repetition_penalty=10.0,
        top_k=10,
        top_p=0.3,
    )
    wav_chunks.append(torch.tensor(wav_chunk["wav"]))

out_wav = torch.cat(wav_chunks, dim=0).unsqueeze(0).cpu()

from IPython.display import Audio
Audio(out_wav, rate=24000)

⚠️ Important Note

The model might not perform well on very long inputs. You can write your own text splitter to split longer inputs into shorter sentences based on your needs.

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご