Kan-bayashi JVS TTS Finetune Open-source Japanese Text-to-Speech Model - Supports High-quality Japanese Speech Synthesis

Kan Bayashi Jvs Tts Finetune Jvs001 Jsut Vits Raw Phn Jaconv Pyopenjta Truncated 178804

Developed by espnet

This is a Japanese text-to-speech (TTS) model trained on the ESPnet framework, fine-tuned using the JVS dataset, supporting high-quality Japanese speech synthesis.

Speech Synthesis Japanese#Japanese TTS #High-fidelity Speech Synthesis #VITS Architecture

Downloads 19

Release Time : 3/2/2022

Model Overview

This model is a Japanese text-to-speech system capable of converting input Japanese text into natural and fluent speech output. It is based on the VITS architecture and utilizes tools like jaconv and pyopenjtalk for text processing.

Model Features

High-Quality Speech Synthesis

Capable of generating natural and fluent Japanese speech output.

Based on VITS Architecture

An end-to-end TTS system using variational inference and adversarial training.

Supports Pause Handling

The model can handle natural pauses in speech.

Pitch Control

Supports handling pitch variations in Japanese.

Model Capabilities

Japanese Text-to-Speech

Speech Synthesis

Pitch Control

Use Cases

Voice Assistants

Smart Customer Service Voice

Provides natural speech output for Japanese customer service systems.

Enhances user experience and interaction naturalness.

Audiobook Content Creation

E-book Narration

Converts Japanese text content into speech.

Facilitates visually impaired users or provides multimodal content.

🚀 ESPnet2 TTS Pretrained Model

This is a pre - trained TTS model in ESPnet2, which provides high - quality text - to - speech capabilities. It was trained on specific datasets and can be used for various audio - related tasks.

🚀 Quick Start

This section will guide you through the basic steps of using this ESPnet2 TTS pretrained model.

✨ Features

Dataset - based Training: Trained using the jvs/tts1 recipe in espnet, leveraging the jvs dataset.
Imported from Zenodo: ♻️ Imported from https://zenodo.org/record/5432540/.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

# coming soon

📚 Documentation

Model Information

Property	Details
Model Name	`kan - bayashi/jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjtalk_accent_with_pause_latest`
Model Type	ESPnet2 TTS pretrained model
Training Data	jvs
License	cc - by - 4.0

Citing ESPnet

If you use this model, please cite the following papers:

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This model is released under the cc - by - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご