Kan-bayashi Libritts Xvector VITS Open-source Text-to-Speech Model

Kan Bayashi Libritts Xvector Vits

Developed by espnet

A text-to-speech model trained using the ESPnet framework, trained on the LibriTTS dataset, supporting English speech synthesis.

Speech Synthesis English#High-quality speech synthesis #Multi-speaker support #xVector speaker embedding

Downloads 61

Release Time : 3/2/2022

Model Overview

This model is an end-to-end text-to-speech (TTS) model capable of converting input English text into natural speech output.

Model Features

High-quality speech synthesis

Capable of generating natural and fluent English speech

End-to-end architecture

Utilizes the VITS architecture for direct text-to-speech conversion

x-vector support

Incorporates x-vector features, potentially enabling speaker characteristic control

Model Capabilities

English text-to-speech

High-quality speech synthesis

Use Cases

Speech synthesis applications

Audiobook generation

Convert e-book text into speech

Generate natural and fluent audiobooks

Voice assistants

Provide speech output capabilities for smart devices

Enable more natural voice interactions

🚀 ESPnet2 TTS Pretrained Model

This is a pre - trained TTS model in ESPnet2, which can effectively handle audio and text - to - speech tasks. It leverages the libritts dataset and provides a reliable solution for speech synthesis.

🚀 Quick Start

`kan - bayashi/libritts_xvector_vits`

♻️ Imported from https://zenodo.org/record/5521416/

This model was trained by kan - bayashi using libritts/tts1 recipe in espnet.

💻 Usage Examples

Basic Usage

# coming soon

📚 Documentation

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This model is released under the cc - by - 4.0 license.

Property	Details
Tags	espnet, audio, text - to - speech
Datasets	libritts
License	cc - by - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご