🚀 ⓍTTS
ⓍTTS is a voice generation model that enables voice cloning across different languages using just a 6 - second audio clip, eliminating the need for extensive training data spanning countless hours. It powers Coqui Studio and Coqui API.
🚀 Quick Start
The code - base supports inference and fine - tuning. You can also try the model through the following demo spaces:
✨ Features
- Supports 16 languages.
- Enables voice cloning with just a 6 - second audio clip.
- Allows emotion and style transfer through cloning.
- Supports cross - language voice cloning.
- Facilitates multi - lingual speech generation.
- Operates at a 24kHz sampling rate.
🆕 Updates over XTTS - v1
- Added support for 2 new languages: Hungarian and Korean.
- Improved the architecture for speaker conditioning.
- Allows the use of multiple speaker references and interpolation between speakers.
- Enhanced stability.
- Improved prosody and audio quality across the board.
🌐 Languages
XTTS - v2 supports 16 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh - cn), Japanese (ja), Hungarian (hu) and Korean (ko).
Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out!
💻 Usage Examples
Basic Usage
Using 🐸TTS API
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
file_path="output.wav",
speaker_wav="/path/to/target/speaker.wav",
language="en")
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
file_path="output.wav",
speaker_wav="/path/to/target/speaker.wav",
language="en",
decoder_iterations=30)
Using 🐸TTS Command line
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
--text "Bugün okula gitmek istemiyorum." \
--speaker_wav /path/to/target/speaker.wav \
--language_idx tr \
--use_cuda true
Advanced Usage
Using the model directly
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()
outputs = model.synthesize(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
config,
speaker_wav="/data/TTS-public/_refclips/3.wav",
gpt_cond_len=3,
language="en",
)
📄 License
This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.
📞 Contact
Come and join in our 🐸Community. We're active on Discord and Twitter. You can also mail us at info@coqui.ai.
📋 Information Table