Japanese - Parler - TTS - Mini Open - source Combined Model - Free and Efficient Conversion of Japanese Text to Speech

Japanese Parler Tts Mini

Developed by 2121-8

Parler-TTS Mini v1 is a lightweight text-to-speech model, while Retrieva-JP T5 Base Long is a Japanese text processing model. They are combined for Japanese text-to-speech tasks.

Speech Synthesis

Transformers

JapaneseOpen Source License:Other #Japanese TTS #Lightweight Speech Synthesis #Long Text Optimization

Downloads 1,250

Release Time : 12/2/2024

Model Overview

This model combination focuses on Japanese text-to-speech tasks, with Parler-TTS Mini v1 handling speech synthesis and Retrieva-JP T5 Base Long managing text processing.

Model Features

Lightweight Speech Synthesis

Parler-TTS Mini v1 is a lightweight speech synthesis model suitable for resource-constrained environments.

Japanese Text Processing

Retrieva-JP T5 Base Long is optimized specifically for Japanese text, providing high-quality text processing capabilities.

Efficient Combination

The combination of these two models offers an efficient solution for Japanese text-to-speech tasks.

Model Capabilities

Japanese Text-to-Speech

Speech Synthesis

Japanese Text Processing

Use Cases

Voice Assistants

Japanese Voice Assistant

Used to build Japanese voice assistants that convert text into natural speech.

Generates fluent Japanese speech output

Education

Japanese Learning Tool

Used in Japanese learning applications to help learners hear correct pronunciations.

Provides accurate Japanese pronunciation

🚀 Japanese Parler-TTS Mini

This repository publishes a model retrained based on parler-tts/parler-tts-mini-v1 to enable text-to-speech in Japanese. This model provides high-quality voice generation while being lightweight.

⚠️ Important Note

It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.

Property	Details
Language	Japanese
Base Model	parler-tts/parler-tts-mini-v1, retrieva-jp/t5-base-long
Pipeline Tag	text-to-speech
Library Name	transformers
Tags	text-to-speech, annotation, Japanese
License	other

🚀 Quick Start

📦 Installation

You can install it using the following commands:

pip install git+https://github.com/huggingface/parler-tts.git
pip install git+https://github.com/getuka/RubyInserter.git

💻 Usage Examples

🔍 Japanese Parler-TTS Index

Japanese Parler-TTS Mini
Japanese Parler-TTS Large (Will be trained if there are sufficient computing resources)

📖 Quick Index

👨‍💻 Installation
🎲 Usage with Random Voices
🎯 Method of Specifying a Specific Speaker

🎲 Usage with Random Voices

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."

prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)

Sample Audio

🎯 Method of Specifying a Specific Speaker

Training data used: JSUT

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "JSUT speaks with an expressive and animated tone in an excellent recording, with a very close-sounding proximity that suggests a private and intimate setting, and delivers her words at a rapid pace."

prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)

Sample Audio

📄 License

This model and repository are permitted for use for a wide range of purposes, including research, education, and commercial use. However, please comply with the following conditions:

Commercial Use Conditions
You can use the voices and outputs generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator does not guarantee the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model is not responsible for any copyright infringement or other legal issues.
Response to Deletion Requests
In the event of a copyright issue, the problematic resources or data will be promptly deleted.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご