🚀 SpeechT5-it
This model, SpeechT5-it, is a fine - tuned variant of microsoft/speecht5_tts on the VOXPOPULI dataset. It offers high - quality text - to - speech conversion and achieves a loss of 0.46 on the evaluation set.
📚 Documentation
Model Information
Property |
Details |
Model Type |
Fine - tuned version of microsoft/speecht5_tts |
Training Data |
facebook/voxpopuli |
Pipeline Tag |
text - to - speech |
Base Model |
microsoft/speecht5_tts |
Evaluation Results
This model achieves the following results on the evaluation set:
Training and Evaluation Data
This model was trained on the VOXPOPULI dataset with the Italian configuration (it
). The evaluation was performed on the validation split of the same dataset.
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 40
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
0.5641 |
1.0 |
712 |
0.5090 |
0.5394 |
2.0 |
1424 |
0.4915 |
0.5277 |
3.0 |
2136 |
0.4819 |
0.5136 |
4.0 |
2848 |
0.4798 |
0.5109 |
5.0 |
3560 |
0.4733 |
0.5078 |
6.0 |
4272 |
0.4731 |
0.5033 |
7.0 |
4984 |
0.4692 |
0.5021 |
8.0 |
5696 |
0.4691 |
0.4984 |
9.0 |
6408 |
0.4670 |
0.488 |
10.0 |
7120 |
0.4641 |
0.491 |
11.0 |
7832 |
0.4641 |
0.4918 |
12.0 |
8544 |
0.4647 |
0.4933 |
13.0 |
9256 |
0.4622 |
0.499 |
14.0 |
9968 |
0.4619 |
0.4906 |
15.0 |
10680 |
0.4608 |
0.4884 |
16.0 |
11392 |
0.4622 |
0.4847 |
17.0 |
12104 |
0.4616 |
0.4916 |
18.0 |
12816 |
0.4592 |
0.4845 |
19.0 |
13528 |
0.4600 |
0.4788 |
20.0 |
14240 |
0.4594 |
0.4746 |
21.0 |
14952 |
0.4607 |
0.4875 |
22.0 |
15664 |
0.4615 |
0.4831 |
23.0 |
16376 |
0.4597 |
0.4798 |
24.0 |
17088 |
0.4595 |
0.4727 |
25.0 |
17800 |
0.4592 |
0.4736 |
26.0 |
18512 |
0.4598 |
0.4746 |
27.0 |
19224 |
0.4608 |
0.4728 |
28.0 |
19936 |
0.4589 |
0.4771 |
29.0 |
20648 |
0.4593 |
0.4743 |
30.0 |
21360 |
0.4588 |
0.4785 |
31.0 |
22072 |
0.4601 |
0.4757 |
32.0 |
22784 |
0.4597 |
0.4731 |
33.0 |
23496 |
0.4598 |
0.4746 |
34.0 |
24208 |
0.4593 |
0.4715 |
35.0 |
24920 |
0.4599 |
0.4769 |
36.0 |
25632 |
0.4622 |
0.4778 |
37.0 |
26344 |
0.4605 |
0.4798 |
38.0 |
27056 |
0.4594 |
0.4694 |
39.0 |
27768 |
0.4607 |
0.468 |
40.0 |
28480 |
0.4600 |
Framework Versions
- Transformers 4.30.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3
📄 License
This model is released under the MIT license.