FastSpeech 2開源文本轉語音模型 - 免費實現英語單說話人女聲合成

首頁

Text To Speech

由Nithu開發

基於Fairseq S²的FastSpeech 2文本轉語音模型，支持英語單說話人女聲合成。

語音合成英語#高質量語音合成 #單說話人女聲 #英語TTS

下載量 40

發布時間 : 10/20/2023

模型概述

該模型是一個基於FastSpeech 2架構的文本轉語音(TTS)模型，專門用於英語單說話人女聲的語音合成，訓練數據來自LJSpeech數據集。

模型特點

高質量語音合成

基於FastSpeech 2架構，能夠生成自然流暢的英語女聲語音。

單說話人模型

專注於單一說話人(女聲)的語音合成，確保一致的音色和質量。

集成HiFi-GAN聲碼器

使用HiFi-GAN作為聲碼器，提供高質量的音頻波形生成。

模型能力

英語文本轉語音

單說話人語音合成

高質量音頻生成

使用案例

語音合成應用

語音助手

為虛擬助手提供自然語音輸出

生成自然流暢的英語女聲

有聲讀物

將文本內容轉換為語音

生成適合長時間聆聽的舒適語音

教育應用

為學習應用提供語音輸出

清晰的英語發音有助於語言學習

🚀 fastspeech2-en-ljspeech

FastSpeech 2是來自fairseq S²的文本轉語音模型，它能將文本快速且高質量地轉換為語音。本模型支持英文，使用單聲道女性語音，在LJSpeech數據集上進行了訓練。

🚀 快速開始

FastSpeech 2文本轉語音模型是基於fairseq S²開發的，具備快速、高效的語音合成能力。以下是使用該模型進行語音合成的示例代碼：

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/fastspeech2-en-ljspeech",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Hello, this is a test run."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

更多詳細示例請參考 fairseq S²示例。

💻 使用示例

基礎用法

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/fastspeech2-en-ljspeech",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Hello, this is a test run."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

高級用法

# 高級用法可根據具體需求修改參數，如調整語音的語速、語調等。這裡可以根據實際情況添加更多代碼示例。
# 示例代碼保持不變，可根據實際情況進行擴展
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd


models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/fastspeech2-en-ljspeech",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Hello, this is a test run."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

📄 許可證

引用信息

如果您使用了該模型，請按照以下格式進行引用：

@inproceedings{wang-etal-2021-fairseq,
    title = "fairseq S{\^{}}2: A Scalable and Integrable Speech Synthesis Toolkit",
    author = "Wang, Changhan  and
      Hsu, Wei-Ning  and
      Adi, Yossi  and
      Polyak, Adam  and
      Lee, Ann  and
      Chen, Peng-Jen  and
      Gu, Jiatao  and
      Pino, Juan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.17",
    doi = "10.18653/v1/2021.emnlp-demo.17",
    pages = "143--152",
}