XTTS-v1 Open-source Speech Generation Model - Free Deployment, 6-second Voice Cloning, Supports Multilingual Applications

XTTS V1

Developed by coqui

ⓍTTS is a voice generation model that can clone voices and apply them to different languages with just a 6-second audio clip.

Speech Synthesis Open Source License:Other #Cross-Language Voice Cloning #6-Second Rapid Cloning #Multilingual Synthesis

Downloads 5,449

Release Time : 9/13/2023

Model Overview

A cross-language voice cloning and generation model based on the Tortoise architecture, supporting 14 languages and enabling emotion and style transfer.

Model Features

Rapid Voice Cloning

Clones target voice characteristics with just 6 seconds of audio

Cross-Language Support

Supports voice generation and cross-language cloning in 14 languages

Emotion Transfer

Preserves the emotional and stylistic features of the original audio

High-Quality Output

Generates natural speech at 24kHz sampling rate

Model Capabilities

Text-to-Speech

Voice Cloning

Cross-Language Voice Generation

Emotion and Style Transfer

Use Cases

Content Creation

Multilingual Audio Content Generation

Quickly generates multilingual voiceovers for videos, podcasts, etc.

Supports multiple language outputs while maintaining consistent voice characteristics

Assistive Technology

Voice Assistive Tools

Creates personalized voice output for individuals with speech impairments

Restores the user's original voice characteristics with minimal samples

🚀 ⓍTTS

ⓍTTS is a voice generation model that enables voice cloning across different languages using just a 6 - second audio clip. Built on Tortoise, it features significant model improvements, making cross - language voice cloning and multilingual speech generation extremely easy. There's no need for an excessive amount of training data spanning countless hours. This model powers Coqui Studio and Coqui API, with optimizations for faster performance and streaming inference.

🚀 Quick Start

The current implementation supports inference and fine - tuning.

✨ Features

Supports 14 languages.
Voice cloning with just a 6 - second audio clip.
Emotion and style transfer by cloning.
Cross - language voice cloning.
Multi - lingual speech generation.
24khz sampling rate.

💻 Usage Examples

Basic Usage

from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)

# generate speech by cloning a voice using default settings
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en")

# generate speech by cloning a voice using custom settings
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en",
                decoder_iterations=30)

Advanced Usage

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    config,
    speaker_wav="/data/TTS-public/_refclips/3.wav",
    gpt_cond_len=3,
    language="en",
)

📚 Documentation

Languages

As of now, XTTS - v1 (v1.1) supports 14 languages: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, and Japanese.

Stay tuned as we continue to add support for more languages. If you have any language requests, please feel free to reach out!

Using 🐸TTS Command line

 tts --model_name tts_models/multilingual/multi-dataset/xtts_v1 \
     --text "Bugün okula gitmek istemiyorum." \
     --speaker_wav /path/to/target/speaker.wav \
     --language_idx tr \
     --use_cuda true

📄 License

This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.

📞 Contact

Come and join in our 🐸Community. We're active on Discord and Twitter. You can also mail us at info@coqui.ai.

⚠️ Important Note

ⓍTTS V2 model is out here XTTS V2

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご