This model is primarily used to convert Japanese text into natural speech, suitable for applications such as voice assistants and audiobooks.
Model Features
Japanese Support
A text-to-speech model specifically optimized for Japanese, delivering high-quality speech synthesis.
Compact Design
The model has a small size, making it suitable for deployment in resource-limited environments.
High-Quality Speech
Trained on the LibriTTS dataset, it generates natural and smooth speech.
Model Capabilities
Japanese Text-to-Speech
High-Quality Speech Synthesis
Use Cases
Voice Assistants
Japanese Voice Assistant
Provides voice interaction for Japanese users
Generates natural and smooth Japanese speech responses
Audiobooks
Japanese Audiobooks
Converts Japanese text into speech
Delivers a high-quality reading experience
🚀 Japanese Parler-TTS Mini (β Version)
This repository releases a model retrained based on parler-tts/parler-tts-mini-v1 to enable text-to-speech in Japanese. This model provides high-quality speech generation while being lightweight.
Note: It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.
Currently, this repository is in the β version. The optimization of features and the model is in progress towards the official release.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby
device = "cuda:0"if torch.cuda.is_available() else"cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini-bate").to(device)
tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini-bate")
prompt = "こんにちは、今日はどのようにお過ごしですか?"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."
prompt = add_ruby(prompt)
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
Sample Audio
Advanced Usage
Specifying a particular speaker is in preparation.
📚 Documentation
📄 License
This model and repository are permitted for use for a wide range of purposes, including research, education, and commercial use. However, please comply with the following conditions:
Commercial Use Conditions
You can use the speech and results generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator makes no guarantee regarding the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model assumes no responsibility for copyright infringement or other legal issues.
Response to Deletion Requests
In the event of a copyright issue, the problematic resources or data will be promptly deleted.
⚠️ Important Note
Due to the composition of the training data, there is limited data related to male voices. Therefore, generating male voices may not work as expected. In particular, it may be difficult to adjust the natural intonation and sound quality.
💡 Usage Tip
The official release versions of Japanese Parler-TTS Mini and Large are in preparation. You can stay tuned for updates.