open-source Japanese TTS model japanese-parler-tts-large-bate

Japanese Parler Tts Large Bate

Developed by 2121-8

A Japanese text-to-speech model fine-tuned based on parler-tts-large-v1, capable of generating high-quality Japanese speech

Speech Synthesis

Transformers

JapaneseOpen Source License:Other #Japanese Speech Synthesis #High-Quality Voice Generation #Lightweight TTS

Downloads 114

Release Time : 11/19/2024

Model Overview

This model is a Japanese-adapted text-to-speech model based on parler-tts-large-v1, specializing in Japanese speech synthesis, delivering high-quality voice generation while maintaining lightweight performance.

Model Features

Japanese Speech Synthesis

Speech synthesis capability specifically optimized for Japanese, generating natural and fluent Japanese speech

High-Quality Output

Capable of producing high-quality voice output while maintaining model lightweight

Speaker Control

Supports controlling voice characteristics such as pitch and speech rate through descriptions

Model Capabilities

Japanese Text-to-Speech

Voice Feature Control

High-Quality Voice Generation

Use Cases

Speech Synthesis Applications

Voice Assistants

Provides natural voice output for Japanese voice assistants

Generates natural and fluent Japanese speech

Audiobooks

Converts Japanese text into audiobooks

Produces clear speech suitable for prolonged listening

🚀 Japanese Parler-TTS Large (Beta)

This repository publishes a model retrained based on parler-tts/parler-tts-large-v1 to enable text-to-speech in Japanese. This model provides high-quality speech generation while being lightweight.

⚠️ Important Note

It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.

⚠️ Important Note

This repository is currently in beta. Feature and model optimizations are in progress towards the official release.

Official Release URLs

Japanese Parler-TTS Mini (878M)
Japanese Parler-TTS Large (878M) is in preparation.

📦 Model Information

Property	Details
Model Type	Based on parler-tts/parler-tts-large-v1 and retrieva-jp/t5-base-long
Training Data	ylacombe/libritts_r_filtered, ylacombe/libritts-r-filtered-descriptions-10k-v5-without-accents
Pipeline Tag	text-to-audio
Library Name	transformers
Tags	text-to-speech, annotation, japanese
License	other

✨ Japanese Parler-TTS Index

Japanese Parler-TTS Mini
Japanese Parler-TTS Large (Will be trained if there are sufficient computing resources)
Japanese Parler-TTS Mini Beta
Japanese Parler-TTS Large Beta

⚠️ Important Note

Japanese Parler-TTS Large can generate high-quality speech with rich expressiveness, but its operation may be unstable due to insufficient training. Therefore, if stability is a priority, it is recommended to use the lighter and more stable Japanese Parler-TTS Mini.

⚠️ Important Note

Due to the composition of the training data, there is relatively little data related to male voices. So, generating male voices may not meet expectations. In particular, it may be difficult to adjust the natural intonation and sound quality.

🚀 Quick Start

👨‍💻 Installation
🎲 Usage with Random Voices
🎯 Method of Specifying a Specific Speaker
Acknowledgments

💻 Usage Examples

👨‍💻 Installation

You can install it using the following commands:

pip install git+https://github.com/huggingface/parler-tts.git
pip install git+https://github.com/getuka/RubyInserter.git

🎲 Usage with Random Voices

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-large-bate").to(device)
tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-large-bate")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."

prompt = add_ruby(prompt)
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)

Sample Audio

🎯 Method of Specifying a Specific Speaker

In preparation.

Acknowledgments

We would like to thank the following people for providing the resources for the development of this model:

Without their contributions, this project would not have been possible.

📄 License

This model and repository are permitted for a wide range of uses, including research, education, and commercial use. However, please comply with the following conditions:

Commercial Use Conditions
You can use the speech and products generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator does not guarantee the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model is not responsible for any copyright infringement or other legal issues.
Response to Deletion Requests
In case of copyright issues, the problematic resources or data will be promptly deleted.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご