Kan-bayashi CSMSC VITS Open-Source Text-to-Speech Model - Free Support for Mandarin Chinese Speech Synthesis

Kan Bayashi Csmsc Vits

Developed by espnet

This is a text-to-speech (TTS) model trained on the ESPnet2 framework, using the VITS architecture and supporting Mandarin Chinese.

Speech Synthesis Chinese#Chinese Speech Synthesis #VITS Architecture #High-Quality TTS

Downloads 37

Release Time : 3/2/2022

Model Overview

This model is an end-to-end text-to-speech model capable of converting Chinese text into natural and fluent speech output.

Model Features

End-to-End Speech Synthesis

Utilizes the VITS architecture to achieve end-to-end text-to-speech conversion, simplifying the multi-stage process of traditional speech synthesis

High-Quality Speech Output

Capable of generating natural and fluent Mandarin Chinese speech

ESPnet2 Framework Support

Developed based on ESPnet2, a mature end-to-end speech processing toolkit

Model Capabilities

Chinese Text-to-Speech

Mandarin Speech Synthesis

Use Cases

Voice Interaction

Smart Voice Assistant

Provides Chinese speech output capabilities for smart devices

Accessibility Services

Text-to-Speech

Helps visually impaired individuals access textual information

🚀 ESPnet2 TTS Pretrained Model

This is a pre - trained text - to - speech model in ESPnet2, trained on the csmsc dataset, offering high - quality speech synthesis capabilities.

🚀 Quick Start

This model kan - bayashi/csmsc_vits was imported from https://zenodo.org/record/5499120/. It was trained by kan - bayashi using the csmsc/tts1 recipe in espnet.

💻 Usage Examples

Basic Usage

# coming soon

📄 License

This project is licensed under the CC - BY - 4.0 license.

📚 Documentation

Citing ESPnet

If you use this model, please cite the following papers:

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご