🚀 F5-TTS Italian Finetune
This is an Italian finetune for F5-TTS, focusing solely on the Italian language.
Property |
Details |
Datasets |
ylacombe/cml-tts |
Language |
it |
Base Model |
SWivid/F5-TTS |
Pipeline Tag |
text-to-speech |
License |
cc-by-4.0 |
Library Name |
f5-tts |
🚀 Quick Start
This project is an Italian finetune for F5-TTS, which only supports Italian and cannot speak English properly.
✨ Features
- Trained over 73+ hours of the "train" split of the ylacombe/cml-tts dataset using 8xRTX4090. The training is still in progress.
- Utilized the gradio finetuning app with the following settings:
exp_name="F5TTS_Base"
learning_rate=0.00001
batch_size_per_gpu=10000
batch_size_type="frame"
max_samples=64
grad_accumulation_steps=1
max_grad_norm=1
epochs=300
num_warmup_updates=2000
save_per_updates=600
last_per_steps=300
finetune=true
file_checkpoint_train=""
tokenizer_type="char"
tokenizer_file=""
mixed_precision="fp16"
logger="wandb"
bnb_optimizer=false
📚 Documentation
Pre processing
The data extracted from the datasource has been preprocessed in its transcription. Punctuation is preserved as it is important for teaching pauses and proper intonation. The original Italian "text" field contained direct dialogue escapes (long hyphen), which were also preserved. However, hyphens used to split a word into a new line were removed to merge the two parts of the word, as training on such artifacts did not impact the speech. This only applies to Italian data in the cml-tts dataset, and it is unknown if other languages are affected.
Current most trained model
The most trained model is model_159600.safetensors
(~290 Epoch).
Known problems
- Catastrophic failure: Since it is Italian-only, the model has lost English skills. A proper multilanguage dataset should be used instead of a single language.
- Not perfect pronunciation: The pronunciation is not yet perfect.
- Number conversion: Numbers must be converted into letters to be pronounced in Italian.
- Dataset improvement: A better dataset with more diverse voices would help improve zero-shot cloning.
Checkpoints folder
The checkpoints
folder contains the weights of the checkpoints at specific steps. The higher the number, the further the model has progressed in training. These weights can be used as a starting point to continue training. If you can further finetune the model to achieve better results, please let me know.
📄 License
This project is licensed under the cc-by-4.0 license.