kan-bayashi_ljspeechオープンソーステキストツースピーチモデル - 無料でデプロイして自然な音声合成を実現する

ホーム

Kan Bayashi Ljspeech Joint Finetune Conformer Fastspeech2 Hifigan

espnetによって開発

これはESPnet2に基づくテキスト音声変換(TTS)モデルで、LJSpeechデータセットを使用して訓練され、Conformer、FastSpeech2、HiFi - GANアーキテクチャを組み合わせています。

音声合成英語#統合ファインチューニングTTS #Conformerアーキテクチャ #HiFi - GANボコーダ

ダウンロード数 20

リリース時間 : 3/2/2022

モデル概要

このモデルは高品質の英語テキスト音声変換システムで、テキスト入力を自然で流れる音声出力に変換することができます。

モデル特徴

統合アーキテクチャ

Conformerのシーケンスモデリング能力、FastSpeech2の効率的な合成、HiFi - GANの高品質ボコーダを組み合わせています。

高品質音声

自然で流れる英語音声を生成することができます。

ESPnet2統合

ESPnet2フレームワークに基づいており、他の音声処理ツールとの統合が容易です。

モデル能力

テキスト音声変換

英語音声合成

使用事例

音声合成アプリケーション

オーディオブック生成

電子書籍のテキストを自然な音声に変換します。

高品質の英語オーディオブックを生成します。

音声アシスタント

スマートデバイスに自然な音声出力を提供します。

ユーザー体験の自然度を向上させます。

🚀 ESPnet2 TTS 事前学習済みモデル

このモデルは、音声合成（Text-to-Speech）に特化した事前学習済みモデルで、espnetフレームワークを用いて構築されています。ljspeechデータセットを使用して訓練され、高品質な音声合成が可能です。

🚀 クイックスタート

このモデルは、kan-bayashiによって espnet のljspeech/tts1レシピを用いて訓練されました。 ♻️ https://zenodo.org/record/5498896/ からインポートされました。

💻 使用例

基本的な使用法

# coming soon

📄 ライセンス

このモデルはCC BY 4.0ライセンスの下で提供されています。

📚 ドキュメント

ESPnetの引用

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

またはarXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}