Parler-TTS Mini v1 is a lightweight text-to-speech model, while Retrieva-JP T5 Base Long is a Japanese text processing model. They are combined for Japanese text-to-speech tasks.
This model combination focuses on Japanese text-to-speech tasks, with Parler-TTS Mini v1 handling speech synthesis and Retrieva-JP T5 Base Long managing text processing.
Model Features
Lightweight Speech Synthesis
Parler-TTS Mini v1 is a lightweight speech synthesis model suitable for resource-constrained environments.
Japanese Text Processing
Retrieva-JP T5 Base Long is optimized specifically for Japanese text, providing high-quality text processing capabilities.
Efficient Combination
The combination of these two models offers an efficient solution for Japanese text-to-speech tasks.
Model Capabilities
Japanese Text-to-Speech
Speech Synthesis
Japanese Text Processing
Use Cases
Voice Assistants
Japanese Voice Assistant
Used to build Japanese voice assistants that convert text into natural speech.
Generates fluent Japanese speech output
Education
Japanese Learning Tool
Used in Japanese learning applications to help learners hear correct pronunciations.
Provides accurate Japanese pronunciation
๐ Japanese Parler-TTS Mini
This repository publishes a model retrained based on parler-tts/parler-tts-mini-v1 to enable text-to-speech in Japanese. This model provides high-quality voice generation while being lightweight.
โ ๏ธ Important Note
It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby
device = "cuda:0"if torch.cuda.is_available() else"cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")
prompt = "ใใใซใกใฏใไปๆฅใฏใฉใฎใใใซใ้ใใใงใใ๏ผ"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."
prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby
device = "cuda:0"if torch.cuda.is_available() else"cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")
prompt = "ใใใซใกใฏใไปๆฅใฏใฉใฎใใใซใ้ใใใงใใ๏ผ"
description = "JSUT speaks with an expressive and animated tone in an excellent recording, with a very close-sounding proximity that suggests a private and intimate setting, and delivers her words at a rapid pace."
prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
Sample Audio
๐ License
This model and repository are permitted for use for a wide range of purposes, including research, education, and commercial use. However, please comply with the following conditions:
Commercial Use Conditions
You can use the voices and outputs generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator does not guarantee the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model is not responsible for any copyright infringement or other legal issues.
Response to Deletion Requests
In the event of a copyright issue, the problematic resources or data will be promptly deleted.