zlm_b64_le5_s8000 Open-source Speech Synthesis Model - Create Clear and Natural Speech Effects Based on Fine-tuning

Zlm B64 Le5 S8000

Developed by mikhail-panzo

A fine-tuned speech synthesis model based on microsoft/speecht5_tts, trained on an unknown dataset with a validation loss of 0.3771.

Downloads 29

Release Time : 4/28/2024

Model Overview

This model is a fine-tuned speech synthesis (TTS) model based on microsoft/speecht5_tts, with unspecified specific uses and training data.

Efficient Fine-tuning

Fine-tuned based on the pre-trained SpeechT5 model, with 8000 training steps and validation loss reduced to 0.3771.

Optimized Training Configuration

Uses the Adam optimizer with a learning rate of 1e-05, batch size of 64, and employs linear learning rate scheduling with 2000 warm-up steps.

Text-to-Speech Conversion

Speech Synthesis

Speech Synthesis Applications

Voice Assistants

Can be used to generate natural speech for voice assistants

Audiobooks

Can convert text content into speech for creating audiobooks

Training Loss	Epoch	Step	Validation Loss
0.7074	0.4188	500	0.6029
0.5916	0.8375	1000	0.4968
0.5206	1.2563	1500	0.4592
0.4979	1.6750	2000	0.4388
0.4852	2.0938	2500	0.4211
0.4615	2.5126	3000	0.4088
0.4521	2.9313	3500	0.4002
0.4431	3.3501	4000	0.3948
0.4393	3.7688	4500	0.3914
0.4271	4.1876	5000	0.3861
0.4317	4.6064	5500	0.3836
0.4265	5.0251	6000	0.3809
0.424	5.4439	6500	0.3794
0.4123	5.8626	7000	0.3786
0.4117	6.2814	7500	0.3776
0.4155	6.7002	8000	0.3771

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base