Open-source Text-to-Speech Model tts_transformer-ru-cv7_css10

Tts Transformer Ru Cv7 Css10

Developed by facebook

A Transformer-based text-to-speech model from fairseq S², supporting Russian single male speaker, pre-trained on Common Voice v7 and fine-tuned on CSS10.

Speech Synthesis Other#Russian TTS #Single male speaker #Transformer architecture

Downloads 96

Release Time : 3/2/2022

Model Overview

This is a Russian text-to-speech (TTS) model using Transformer architecture, capable of converting Russian text into natural speech.

Model Features

Russian single male speaker

Single male speaker voice synthesis optimized specifically for Russian

Transformer architecture

Utilizes advanced Transformer architecture to provide high-quality speech synthesis

Multi-dataset training

Pre-trained on Common Voice v7 and fine-tuned on CSS10 to improve speech quality

Model Capabilities

Russian text-to-speech

High-quality speech synthesis

Use Cases

Voice applications

Voice assistant

Provides natural voice output for Russian voice assistants

Generates natural and fluent Russian speech

Audiobooks

Converts Russian text into speech for audiobook production

Delivers clear and natural reading effects

🚀 tts_transformer-ru-cv7_css10

A text-to-speech model based on the Transformer architecture from fairseq S^2. It supports Russian with a single - speaker male voice, pre - trained on Common Voice v7 and fine - tuned on CSS10.

🚀 Quick Start

This is a Transformer text-to-speech model from fairseq S^2 (paper/code):

Supports Russian language.
Employs a single-speaker male voice.
Pre-trained on Common Voice v7 and fine-tuned on CSS10.

✨ Features

Language Support: Specifically designed for the Russian language.
Voice Characteristics: Utilizes a single-speaker male voice.
Training Data: Pre-trained on Common Voice v7 and fine-tuned on CSS10.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/tts_transformer-ru-cv7_css10",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Здравствуйте, это пробный запуск."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

📚 Documentation

No detailed documentation is provided in the original README.

🔧 Technical Details

No technical details are provided in the original README.

📄 License

No license information is provided in the original README.

📚 Citation

@inproceedings{wang-etal-2021-fairseq,
    title = "fairseq S{\^{}}2: A Scalable and Integrable Speech Synthesis Toolkit",
    author = "Wang, Changhan  and
      Hsu, Wei-Ning  and
      Adi, Yossi  and
      Polyak, Adam  and
      Lee, Ann  and
      Chen, Peng-Jen  and
      Gu, Jiatao  and
      Pino, Juan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.17",
    doi = "10.18653/v1/2021.emnlp-demo.17",
    pages = "143--152",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご