🚀 Fine-tuned SpeechT5 TTS Model for Haitian Creole
This model is a fine - tuned version of [microsoft/speecht5 - tts](https://huggingface.co/microsoft/speecht5 - tts) for the Haitian Creole language, enabling text - to - speech conversion in this language.
🚀 Quick Start
This fine - tuned SpeechT5 TTS model is ready to be used for text - to - speech applications in Haitian Creole. You can start integrating it into your projects right away.
✨ Features
- Language - Specific: Specifically fine - tuned for the Haitian Creole language, allowing for accurate speech synthesis from Haitian Creole text.
- Based on SpeechT5: Utilizes the SpeechT5 architecture, a specialized variant of T5 for text - to - speech tasks.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model Description
The model is based on the SpeechT5 architecture, which is a variant of the T5 (Text - to - Text Transfer Transformer) model designed specifically for text - to - speech tasks. The model is capable of converting input text in Haitian Creole into corresponding speech.
Intended Uses & Limitations
The model is intended for text - to - speech (TTS) applications in Haitian Creole language processing. It can be used for generating speech from written text, enabling applications such as audiobook narration, voice assistants, and more.
However, there are some limitations to be aware of:
- The model's performance heavily depends on the quality and diversity of the training data. Fine - tuning on more diverse and specific datasets might improve its performance.
- Like all machine learning models, this model may produce inaccuracies or errors in speech synthesis, especially for complex sentences or domain - specific jargon.
Training and Evaluation Data
The model was fine - tuned on the CMU Haitian dataset, which contains text and corresponding audio samples in Haitian Creole. The dataset was split into training and evaluation sets to assess the model's performance.
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- per_device_train_batch_size: 16
- gradient_accumulation_steps: 2
- warmup_steps: 500
- max_steps: 4000
- gradient_checkpointing: True
- fp16: True
- evaluation_strategy: no
- per_device_eval_batch_size: 8
- save_steps: 1000
- logging_steps: 25
- report_to: ["tensorboard"]
- greater_is_better: False
Training Results
The training progress and evaluation results are as follows:
Training Loss |
Epoch |
Step |
Validation Loss |
0.5147 |
2.42 |
1000 |
0.4753 |
0.4932 |
4.84 |
2000 |
0.4629 |
0.4926 |
7.26 |
3000 |
0.4566 |
0.4907 |
9.69 |
4000 |
0.4542 |
0.4839 |
12.11 |
5000 |
0.4532 |
Training Output
The training was completed with the following output:
- Global Step: 4000
- Training Loss: 0.3344
- Training Runtime: 7123.63 seconds
- Training Samples per Second: 17.97
- Training Steps per Second: 0.562
- Total FLOPs: 1.1690e+16
Framework Versions
- Transformers 4.31.0
- PyTorch 2.0.1+cu118
- Datasets 2.13.1
- Tokenizers 0.13.3
🔧 Technical Details
The model is a fine - tuned version of [microsoft/speecht5 - tts](https://huggingface.co/microsoft/speecht5 - tts) for the Haitian Creole language. It leverages the SpeechT5 architecture, which is optimized for text - to - speech tasks. The fine - tuning process on the CMU Haitian dataset allows it to generate speech in Haitian Creole.
📄 License
No license information is provided in the original document, so this section is skipped.