Open-source Russian text-to-speech model speecht5_finetuned_commonvoice_ru_translit

Home

Speecht5 Finetuned Commonvoice Ru Translit

Developed by voxxer

A Russian text-to-speech model fine-tuned on the Common Voice 13 dataset based on microsoft/speecht5_tts

Speech Synthesis

Transformers

OtherOpen Source License:MIT #Russian speech synthesis #Transcribed text input #Common Voice fine-tuning

Downloads 57

Release Time : 8/21/2023

Model Overview

This model is a Russian text-to-speech (TTS) model, and the input should be transcribed Russian text. It is a test model for the HF audio course practice exercises and is not intended for actual use.

Model Features

Russian speech synthesis

Supports converting transcribed Russian text into natural speech

Based on Common Voice dataset

Fine-tuned on the Mozilla Common Voice 13 Russian dataset

Lightweight training

As a practice exercise project, the model is small in scale with limited training steps

Model Capabilities

Russian text-to-speech

Speech synthesis

Use Cases

Education

Speech synthesis teaching example

Used to demonstrate the basic principles and working methods of text-to-speech models

🚀 SpeechT5 - Russian translit

This model is a fine - tuned version of microsoft/speecht5_tts for text - to - speech tasks, specifically trained on the Common Voice 13 dataset.

🚀 Quick Start

This model is a fine - tuned version of microsoft/speecht5_tts on the Common Voice 13 dataset. It achieves a loss of 0.4853 on the evaluation set.

✨ Features

The input should be Russian text in transliterated form (using the transliterate package).
This is just a test for the hands - on exercise of the HF Audio Course and is not intended for actual use.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model description

Input should be a Russian text in transliterated form (use the transliterate package). This is just a test for the hands - on exercise of the HF Audio Course! Not intended for actual use!

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 8
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
training_steps: 2000

Training results

Training Loss	Epoch	Step	Validation Loss
1.0359	0.6	50	0.8176
0.8866	1.19	100	0.6899
0.787	1.79	150	0.6478
0.7477	2.38	200	0.6233
0.6734	2.98	250	0.5630
0.6216	3.58	300	0.5429
0.593	4.17	350	0.5304
0.5817	4.77	400	0.5282
0.5734	5.37	450	0.5167
0.5688	5.96	500	0.5209
0.5662	6.56	550	0.5095
0.5609	7.15	600	0.5127
0.554	7.75	650	0.5041
0.5522	8.35	700	0.5038
0.5372	8.94	750	0.4984
0.5432	9.54	800	0.4995
0.5384	10.13	850	0.4971
0.5345	10.73	900	0.4981
0.5358	11.33	950	0.4942
0.5332	11.92	1000	0.4906
0.5334	12.52	1050	0.4897
0.5301	13.11	1100	0.4914
0.5298	13.71	1150	0.4894
0.524	14.31	1200	0.4871
0.5221	14.9	1250	0.4884
0.525	15.5	1300	0.4883
0.5232	16.1	1350	0.4866
0.5261	16.69	1400	0.4858
0.521	17.29	1450	0.4852
0.5225	17.88	1500	0.4849
0.5219	18.48	1550	0.4860
0.5207	19.08	1600	0.4839
0.5192	19.67	1650	0.4851
0.516	20.27	1700	0.4860
0.5186	20.86	1750	0.4811
0.5233	21.46	1800	0.4841
0.5145	22.06	1850	0.4819
0.5159	22.65	1900	0.4822
0.5146	23.25	1950	0.4831
0.5175	23.85	2000	0.4853

Framework versions

Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

🔧 Technical Details

The model is a fine - tuned version of microsoft/speecht5_tts on the Common Voice 13 dataset. The training process uses specific hyperparameters and an Adam optimizer with a linear learning rate scheduler.

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご