Kan-bayashi LJ Speech VITS Open-source Text-to-Speech Model

Kan Bayashi Ljspeech Vits

Developed by espnet

A VITS-based text-to-speech model trained using the ESPnet framework on the LJSpeech dataset, supporting English speech synthesis.

Speech Synthesis English#High-quality speech synthesis #End-to-end TTS #VITS architecture

Downloads 2,780

Release Time : 3/2/2022

Model Overview

This model is an end-to-end text-to-speech (TTS) model based on the VITS architecture, capable of converting English text into natural speech.

Model Features

End-to-end speech synthesis

Utilizes the VITS architecture for end-to-end text-to-speech conversion without complex feature engineering

High-quality speech output

Trained on the LJSpeech dataset to generate natural and fluent English speech

ESPnet integration

Fully compatible with the ESPnet ecosystem for easy deployment and integration

Model Capabilities

English text-to-speech

High-quality speech synthesis

Use Cases

Speech synthesis applications

Audiobook generation

Automatically convert e-book text into speech

Generate natural and fluent audiobooks

Voice assistants

Provide speech output capabilities for smart assistants

Enhance user experience with natural voice interaction

🚀 ESPnet2 TTS Pretrained Model

This is a pre - trained TTS model in ESPnet2, which provides a powerful solution for text - to - speech tasks. It is trained on specific datasets and can be easily integrated into relevant projects.

🚀 Quick Start

This model was trained by kan - bayashi using the ljspeech/tts1 recipe in espnet. It was imported from https://zenodo.org/record/5443814/.

💻 Usage Examples

Basic Usage

# coming soon

📄 License

This project is licensed under the CC - BY - 4.0 license.

📚 Documentation

Citing ESPnet

If you use this model, please cite the following papers:

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Property	Details
Tags	espnet, audio, text - to - speech
Language	en
Datasets	ljspeech
License	cc - by - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご