Open-source Text-to-Speech Model FastSpeech 2 - Free English Single-speaker Female Voice Synthesis

Text To Speech

Developed by Nithu

FastSpeech 2 text-to-speech model based on Fairseq S², supporting English single female speaker synthesis.

Speech Synthesis English#High-quality speech synthesis #Single female speaker #English TTS

Downloads 40

Release Time : 10/20/2023

Model Overview

This model is a FastSpeech 2 architecture-based text-to-speech (TTS) model, specifically designed for English single female speaker voice synthesis, trained on the LJSpeech dataset.

Model Features

High-quality speech synthesis

Based on the FastSpeech 2 architecture, capable of generating natural and fluent English female voice.

Single-speaker model

Focuses on single-speaker (female) voice synthesis, ensuring consistent timbre and quality.

Integrated HiFi-GAN vocoder

Uses HiFi-GAN as the vocoder to provide high-quality audio waveform generation.

Model Capabilities

English text-to-speech

Single-speaker speech synthesis

High-quality audio generation

Use Cases

Speech synthesis applications

Voice assistants

Providing natural voice output for virtual assistants

Generates natural and fluent English female voice

Audiobooks

Converting text content into speech

Generates comfortable voice suitable for long listening sessions

Educational applications

Providing voice output for learning apps

Clear English pronunciation aids language learning

🚀 fastspeech2-en-ljspeech

This is a text-to-speech model based on FastSpeech 2 from fairseq S^2. It can convert English text into a single - speaker female voice and is trained on the LJSpeech dataset.

🚀 Quick Start

This fastspeech2-en-ljspeech is a text - to - speech model from fairseq S^2 (paper/code) based on FastSpeech 2:

Language: English
Voice Type: Single - speaker female voice
Training Dataset: LJSpeech

✨ Features

Library and Task: It belongs to the fairseq library and is designed for the text - to - speech task.
Tags: Associated with tags like fairseq, audio, and text - to - speech.
Datasets: Trained on the LJSpeech dataset.
Widget Example: You can test it with the text "Hello, this is a test run."

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/fastspeech2-en-ljspeech",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Hello, this is a test run."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

📚 Documentation

No detailed documentation other than the usage example is provided in the original document.

🔧 Technical Details

No specific technical details are provided in the original document.

📄 License

No license information is provided in the original document.

📄 Citation

@inproceedings{wang-etal-2021-fairseq,
    title = "fairseq S{\^{}}2: A Scalable and Integrable Speech Synthesis Toolkit",
    author = "Wang, Changhan  and
      Hsu, Wei-Ning  and
      Adi, Yossi  and
      Polyak, Adam  and
      Lee, Ann  and
      Chen, Peng-Jen  and
      Gu, Jiatao  and
      Pino, Juan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.17",
    doi = "10.18653/v1/2021.emnlp-demo.17",
    pages = "143--152",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご