🚀 OpenF5 TTS Base (Alpha)
OpenF5 TTS is an open-weight text-to-speech model. It supports zero-shot voice cloning and is trained with the F5-TTS framework. The key difference from the original F5-TTS model lies in its license. Thanks to the permissive training data, this model is available under the Apache 2.0 license, allowing both commercial and personal use.
🚀 Quick Start
To start using the OpenF5 TTS model, you can follow these steps:
pip install f5-tts
huggingface-cli download mrfakename/OpenF5-TTS --local-dir openf5
f5-tts_infer-cli -mc openf5/config.yaml -p openf5/model.pt -v openf5/vocab.txt
✨ Features
- Zero-shot Voice Cloning: Support for zero-shot voice cloning, enabling the generation of voices without a large amount of reference data.
- Permissive License: Licensed under the Apache 2.0 license, suitable for both commercial and non - commercial use.
- Future Improvements: Upcoming variants will offer better voice cloning capabilities, enhanced emotional speech generation, and more stable performance.
📚 Documentation
Details
This model was trained using the F5 - TTS Base V1 model configuration for 1 million steps on the Emilia - YODAS dataset. It was only trained on English speech.
Safety
This model is provided under the Apache 2.0 License. Users are encouraged to consider the ethical and societal impacts of synthetic speech technologies. Potential misuse, such as impersonation, deception, or privacy violation, can cause harm. Although the license allows for broad usage, responsible application is advised, especially in scenarios involving identifiable individuals or public communication. This model aims to support research, accessibility, and creative work in a transparent and accountable manner.
Acknowledgements
- Special thanks to Hugging Face for providing the compute resources to train this model.
- Thanks to the authors of [F5 - TTS](https://github.com/SWivid/F5 - TTS) for releasing the excellent F5 - TTS model and codebase, and to lucidrains for the original open - source [E2 - TTS implementation](https://github.com/lucidrains/e2 - tts - pytorch).
- Additionally, thanks to Amphion for the wonderful [Emilia - YODAS dataset](https://huggingface.co/datasets/amphion/Emilia - Dataset). Without it, this project would not have been possible.
📄 License
This model is licensed under the Apache 2.0 license. You are free to use it for both commercial and non - commercial purposes.
⚠️ Important Note
This model is still in alpha. Performance and stability may be affected. The main goal of this model is to create a permissively - licensed base for further fine - tuning. Currently, it is still inferior to the official NC - licensed F5 - TTS model.
💡 Usage Tip
New variants will soon be released with more post - training/fine - tuning. These versions will have better voice cloning capabilities, enhanced emotional speech generation, and more stable performance. The current model is only a base model, so stay tuned for more updates!