F5-TTS Italian Fine-Tuned Model - Open Source and Free for Italian Text-to-Speech

F5 Ita Test

Developed by alien79

This is the Italian fine-tuned version of the F5-TTS model, trained on the facebook/multilingual_librispeech dataset, specializing in Italian text-to-speech tasks.

Speech Synthesis Other#Italian TTS #Prosody Optimization #Single-language Focus

Downloads 98

Release Time : 11/27/2024

Model Overview

This model is the Italian fine-tuned version of F5-TTS, specifically designed for converting Italian text to speech. Due to catastrophic forgetting during fine-tuning, the model can no longer correctly handle English pronunciation.

Model Features

Italian Specialization

Optimized specifically for Italian, providing better Italian speech synthesis results.

Large-scale Training

Trained on 247+ hours of Italian speech data.

Multiple Checkpoint Options

Offers checkpoints from various training stages, allowing users to select models with different levels of training.

Model Capabilities

Italian Text-to-Speech

Speech Synthesis

Use Cases

Voice Applications

Italian Voice Assistant

Provides voice interaction functionality for Italian-speaking users.

Audiobook Generation

Converts Italian text to speech for audiobook production.

🚀 F5-TTS Italian Finetune

This is an Italian finetune for F5-TTS, aiming to provide high - quality text - to - speech service for Italian.

🚀 Quick Start

This project is an Italian finetune for F5 - TTS. It has some limitations and characteristics as described below.

✨ Features

Language Specific: This model is specifically finetuned for Italian and cannot speak English properly.
Training Data: It is trained over 247+ hours of the "train" split of the facebook/multilingual_librispeech dataset, with 6717 steps per epoch.
Model Status: There was a catastrophic failure where the model forgot English, and the Italian pronunciation is not perfect. However, there are many checkpoints available for further training, possibly with different datasets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training.

📚 Documentation

Current most trained model

The most trained model is italian_59kh/model_464400.safetensors (approximately 70 Epoch).

Folder Structure

| - italian_59kh
|   | - checkpoints

italian_59kh

This folder contains the weights at specific steps. The higher the number, the further the model went into training. Note that the weights in this folder cannot be used to resume training; use the checkpoints folder instead.

italian_59kh/checkpoints

This folder contains the weights of the checkpoints at specific steps. The higher the number, the further the model went into training. The weights in this folder can be used as a starting point to continue training.

📄 License

The project is licensed under cc - by - 4.0.

⚠️ Important Note

UPDATE: A better version with improved prosody here => https://huggingface.co/alien79/F5 - TTS - italian *

💡 Usage Tip

The model has some limitations such as the model forgetting English and imperfect Italian pronunciation. You can use the available checkpoints to extend training, maybe with different datasets.

Property	Details
Datasets	facebook/multilingual_librispeech
Language	it
Base Model	SWivid/F5 - TTS
Pipeline Tag	text - to - speech
License	cc - by - 4.0
Library Name	f5 - tts

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご