This model is a Japanese-adapted text-to-speech model based on parler-tts-large-v1, specializing in Japanese speech synthesis, delivering high-quality voice generation while maintaining lightweight performance.
Model Features
Japanese Speech Synthesis
Speech synthesis capability specifically optimized for Japanese, generating natural and fluent Japanese speech
High-Quality Output
Capable of producing high-quality voice output while maintaining model lightweight
Speaker Control
Supports controlling voice characteristics such as pitch and speech rate through descriptions
Model Capabilities
Japanese Text-to-Speech
Voice Feature Control
High-Quality Voice Generation
Use Cases
Speech Synthesis Applications
Voice Assistants
Provides natural voice output for Japanese voice assistants
Generates natural and fluent Japanese speech
Audiobooks
Converts Japanese text into audiobooks
Produces clear speech suitable for prolonged listening
๐ Japanese Parler-TTS Large (Beta)
This repository publishes a model retrained based on parler-tts/parler-tts-large-v1 to enable text-to-speech in Japanese. This model provides high-quality speech generation while being lightweight.
โ ๏ธ Important Note
It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.
โ ๏ธ Important Note
This repository is currently in beta. Feature and model optimizations are in progress towards the official release.
Japanese Parler-TTS Large can generate high-quality speech with rich expressiveness, but its operation may be unstable due to insufficient training. Therefore, if stability is a priority, it is recommended to use the lighter and more stable Japanese Parler-TTS Mini.
โ ๏ธ Important Note
Due to the composition of the training data, there is relatively little data related to male voices. So, generating male voices may not meet expectations. In particular, it may be difficult to adjust the natural intonation and sound quality.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby
device = "cuda:0"if torch.cuda.is_available() else"cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-large-bate").to(device)
tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-large-bate")
prompt = "ใใใซใกใฏใไปๆฅใฏใฉใฎใใใซใ้ใใใงใใ๏ผ"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."
prompt = add_ruby(prompt)
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
Sample Audio
๐ฏ Method of Specifying a Specific Speaker
In preparation.
Acknowledgments
We would like to thank the following people for providing the resources for the development of this model:
Without their contributions, this project would not have been possible.
๐ License
This model and repository are permitted for a wide range of uses, including research, education, and commercial use. However, please comply with the following conditions:
Commercial Use Conditions
You can use the speech and products generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator does not guarantee the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model is not responsible for any copyright infringement or other legal issues.
Response to Deletion Requests
In case of copyright issues, the problematic resources or data will be promptly deleted.