kan-bayashi_ljspeech_tacotron2 Open-source Text-to-Speech Model - Realize Free Voice Conversion Relying on the Dataset

Kan Bayashi Ljspeech Tacotron2

Developed by espnet

Tacotron2 text-to-speech model trained on ESPnet framework using LJSpeech dataset

Speech Synthesis English#English TTS #High-fidelity speech synthesis #End-to-end model

Downloads 40

Release Time : 3/2/2022

Model Overview

This is a text-to-speech (TTS) model based on Tacotron2 architecture, capable of converting English text into natural speech. The model is trained on the LJSpeech dataset and is suitable for speech synthesis applications.

Model Features

High-quality speech synthesis

Based on Tacotron2 architecture, capable of generating natural and fluent speech output

ESPnet framework support

Trained using ESPnet toolkit, ensuring good compatibility and extensibility

Standard dataset training

Trained on the widely recognized LJSpeech dataset to ensure model quality

Model Capabilities

English text-to-speech

Speech synthesis

Use Cases

Speech applications

Audiobook generation

Automatically convert e-book text into speech

Generate natural and fluent audiobooks

Voice assistant

Provide speech output capabilities for smart devices

Achieve more natural voice interaction experience

🚀 Example ESPnet2 TTS model

This is an ESPnet2 TTS model, which can be used for text - to - speech tasks, trained on the ljspeech dataset.

🚀 Quick Start

`kan-bayashi/ljspeech_tacotron2`

♻️ Imported from https://zenodo.org/record/3989498/

This model was trained by kan-bayashi using ljspeech/tts1 recipe in espnet.

💻 Usage Examples

Basic Usage

# coming soon

📚 Documentation

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This model is released under the CC BY 4.0 license.

Property	Details
Tags	espnet, audio, text - to - speech
Datasets	ljspeech
License	cc - by - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご