tts_transformer-zh-cv7_css10 Open-source Text-to-Speech Model - Supports Single Female Voice Reading in Simplified Chinese

Tts Transformer Zh Cv7 Css10

Developed by facebook

A Transformer-based text-to-speech model built on fairseq S^2, supporting Simplified Chinese with a single female voice, trained on Common Voice v7 and CSS10 datasets.

Speech Synthesis Chinese#Chinese female voice synthesis #Transformer architecture #Multi-dataset training

Downloads 15

Release Time : 3/2/2022

Model Overview

This is a Transformer-based text-to-speech (TTS) model specifically optimized for Simplified Chinese, using a single female voice for speech synthesis. The model was pre-trained on the Common Voice v7 dataset and fine-tuned on the CSS10 dataset.

Model Features

Transformer-based architecture

Utilizes advanced Transformer architecture to deliver high-quality speech synthesis

Chinese speech synthesis

A speech synthesis model specifically optimized for Simplified Chinese

Single female voice

Uses a single female voice for consistent timbre in speech synthesis

Multi-dataset training

Pre-trained on Common Voice v7 and fine-tuned on CSS10 to enhance speech quality

Model Capabilities

Text-to-speech

Chinese speech synthesis

High-quality speech generation

Use Cases

Voice interaction

Voice assistants

Provides natural voice output for Chinese voice assistants

Generates natural and fluent Chinese speech

Audiobooks

Converts Chinese text into speech for audiobook production

Produces clear and audible Chinese narration

Assistive technology

Visual impairment assistance

Offers text-to-speech services for visually impaired individuals

Helps visually impaired individuals access textual information

🚀 tts_transformer-zh-cv7_css10

A Transformer-based text-to-speech model from fairseq S^2 that supports Simplified Chinese with a single-speaker female voice.

🚀 Quick Start

This is a Transformer text-to-speech model from fairseq S^2 (paper/code):

Supports Simplified Chinese.
Employs a single-speaker female voice.
Pre-trained on Common Voice v7 and fine-tuned on CSS10.

💻 Usage Examples

Basic Usage

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/tts_transformer-zh-cv7_css10",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "您好，这是试运行。"

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

📄 Citation

@inproceedings{wang-etal-2021-fairseq,
    title = "fairseq S{\^{}}2: A Scalable and Integrable Speech Synthesis Toolkit",
    author = "Wang, Changhan  and
      Hsu, Wei-Ning  and
      Adi, Yossi  and
      Polyak, Adam  and
      Lee, Ann  and
      Chen, Peng-Jen  and
      Gu, Jiatao  and
      Pino, Juan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.17",
    doi = "10.18653/v1/2021.emnlp-demo.17",
    pages = "143--152",
}

📋 Information Table

Property	Details
Library Name	fairseq
Task	text-to-speech
Tags	fairseq, audio, text-to-speech
Language	Chinese
Datasets	common_voice, css10
Widget Example Text	"您好，这是试运行。"
Widget Example Title	"Hello, this is a test run."

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご