Parler-TTS Mini v1: An Open-Source Japanese Text-to-Speech Model

Home

Japanese Parler Tts Mini Bate

Developed by 2121-8

Parler-TTS Mini v1 is a small Japanese text-to-speech model that supports high-quality speech synthesis.

Speech Synthesis

Transformers

JapaneseOpen Source License:Other #Japanese TTS #Small Parameter Size #Speech Synthesis

Downloads 184

Release Time : 11/19/2024

Model Overview

This model is primarily used to convert Japanese text into natural speech, suitable for applications such as voice assistants and audiobooks.

Model Features

Japanese Support

A text-to-speech model specifically optimized for Japanese, delivering high-quality speech synthesis.

Compact Design

The model has a small size, making it suitable for deployment in resource-limited environments.

High-Quality Speech

Trained on the LibriTTS dataset, it generates natural and smooth speech.

Model Capabilities

Japanese Text-to-Speech

High-Quality Speech Synthesis

Use Cases

Voice Assistants

Japanese Voice Assistant

Provides voice interaction for Japanese users

Generates natural and smooth Japanese speech responses

Audiobooks

Japanese Audiobooks

Converts Japanese text into speech

Delivers a high-quality reading experience

🚀 Japanese Parler-TTS Mini (β Version)

This repository releases a model retrained based on parler-tts/parler-tts-mini-v1 to enable text-to-speech in Japanese. This model provides high-quality speech generation while being lightweight.

Note: It is not compatible with the tokenizer used in the original Parler-TTS. A unique tokenizer is adopted for this model.

Currently, this repository is in the β version. The optimization of features and the model is in progress towards the official release.

URLs for the official release version:

Japanese Parler-TTS Mini (878M)
Japanese Parler-TTS Large (878M) is in preparation.

✨ Features

Property	Details
Language	Japanese
Base Model	parler-tts/parler-tts-mini-v1, retrieva-jp/t5-base-long
Datasets	ylacombe/libritts_r_filtered, ylacombe/libritts-r-filtered-descriptions-10k-v5-without-accents
Pipeline Tag	text-to-audio
Library Name	transformers
Tags	text-to-speech, annotation, japanese
License	other

🚀 Quick Start

📦 Installation

You can install the necessary packages using the following commands:

pip install git+https://github.com/huggingface/parler-tts.git
pip install git+https://github.com/getuka/RubyInserter.git

💻 Usage Examples

Basic Usage

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini-bate").to(device)
tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini-bate")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."

prompt = add_ruby(prompt)
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)

Sample Audio

Advanced Usage

Specifying a particular speaker is in preparation.

📚 Documentation

📄 License

This model and repository are permitted for use for a wide range of purposes, including research, education, and commercial use. However, please comply with the following conditions:

Commercial Use Conditions
You can use the speech and results generated by this model for commercial purposes, but selling the model itself (files, weight data, etc.) is prohibited.
Disclaimer of Appropriateness
The creator makes no guarantee regarding the accuracy, legality, or appropriateness of the results obtained from using this model.
User's Responsibility
When using this model, please comply with all applicable laws and regulations. All responsibilities arising from the generated content belong to the user.
Creator's Disclaimer
The creator of this repository and model assumes no responsibility for copyright infringement or other legal issues.
Response to Deletion Requests
In the event of a copyright issue, the problematic resources or data will be promptly deleted.

⚠️ Important Note

Due to the composition of the training data, there is limited data related to male voices. Therefore, generating male voices may not work as expected. In particular, it may be difficult to adjust the natural intonation and sound quality.

💡 Usage Tip

The official release versions of Japanese Parler-TTS Mini and Large are in preparation. You can stay tuned for updates.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご