🚀 SpeechT5 TTS Turkish
SpeechT5 TTS Turkish 是 microsoft/speecht5_tts 在 turkishvoicedataset 数据集上微调后的版本。该模型在评估集上取得了一定的效果,例如损失率为 0.3079。
🚀 快速开始
安装
!pip install datasets soundfile speechbrain
推理
from transformers import pipeline
from datasets import load_dataset
import soundfile as sf
import torch
from IPython.display import Audio
synthesiser = pipeline("text-to-speech", "umarigan/speecht5_tts_tr_v1.0")
embeddings_dataset = load_dataset("umarigan/turkish_voice_dataset_embedded", split="train")
speaker_embedding = torch.tensor(embeddings_dataset[736]["speaker_embeddings"]).unsqueeze(0)
speech = synthesiser("Bir berber bir berbere gel beraber bir berber kuralım demiş", forward_params={"speaker_embeddings": speaker_embedding})
sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
Audio("speech.wav")
✨ 主要特性
📦 安装指南
!pip install datasets soundfile speechbrain
💻 使用示例
基础用法
from transformers import pipeline
from datasets import load_dataset
import soundfile as sf
import torch
from IPython.display import Audio
synthesiser = pipeline("text-to-speech", "umarigan/speecht5_tts_tr_v1.0")
embeddings_dataset = load_dataset("umarigan/turkish_voice_dataset_embedded", split="train")
speaker_embedding = torch.tensor(embeddings_dataset[736]["speaker_embeddings"]).unsqueeze(0)
speech = synthesiser("Bir berber bir berbere gel beraber bir berber kuralım demiş", forward_params={"speaker_embeddings": speaker_embedding})
sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
Audio("speech.wav")
🔧 技术细节
训练超参数
训练过程中使用了以下超参数:
- 学习率(learning_rate):1e - 05
- 训练批次大小(train_batch_size):16
- 评估批次大小(eval_batch_size):8
- 随机种子(seed):42
- 梯度累积步数(gradient_accumulation_steps):2
- 总训练批次大小(total_train_batch_size):32
- 优化器(optimizer):Adam,β1 = 0.9,β2 = 0.999,ε = 1e - 08
- 学习率调度器类型(lr_scheduler_type):线性
- 学习率调度器热身步数(lr_scheduler_warmup_steps):500
- 训练步数(training_steps):6000
- 混合精度训练(mixed_precision_training):Native AMP
训练结果
训练损失 |
轮数 |
步数 |
验证损失 |
0.4436 |
1.8484 |
1000 |
0.3752 |
0.3822 |
3.6969 |
2000 |
0.3403 |
0.3729 |
5.5453 |
3000 |
0.3233 |
0.3451 |
7.3937 |
4000 |
0.3153 |
0.3315 |
9.2421 |
5000 |
0.3099 |
0.3492 |
11.0906 |
6000 |
0.3079 |
框架版本
- Transformers 4.45.0.dev0
- Pytorch 2.4.1+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
📄 许可证
本项目采用 MIT 许可证。
属性 |
详情 |
模型类型 |
文本转语音(text - to - speech) |
训练数据 |
erenfazlioglu/turkishvoicedataset |