Parler-TTS Mini Open-Source Text-to-Speech Model - A Lightweight and Free Application with Controllable Voice Features

Parler Tts Mini V0.1

Developed by parler-tts

Parler-TTS Mini is a lightweight text-to-speech model trained on 10.5K hours of audio data, supporting voice feature control through text prompts.

Speech Synthesis

Transformers

EnglishOpen Source License:Apache-2.0 #High-quality speech synthesis #Natural language prompt control #Lightweight TTS

Downloads 5,430

Release Time : 4/9/2024

Model Overview

This is a high-quality text-to-speech model capable of generating natural and fluent speech, with simple text prompts to control voice features such as gender, background noise, speech rate, pitch, and reverberation.

Model Features

Voice feature control

Control voice features such as gender, background noise, speech rate, pitch, and reverberation through text prompts.

High-quality audio

Generate high-quality, natural and fluent speech output.

Fully open-source

All datasets, preprocessing code, training code, and weights are publicly released.

Lightweight

The model has a small size, making it suitable for resource-limited environments.

Model Capabilities

Text-to-speech

Voice feature control

High-quality audio generation

Use Cases

Speech synthesis

Audiobook generation

Generate natural and fluent speech versions for e-books or articles.

High-quality, expressive speech output.

Voice assistant

Provide more natural voice interaction capabilities for virtual assistants.

Personalized voice with controllable features.

Assistive technology

Visual impairment assistance

Convert text content into speech for visually impaired individuals.

Clear and understandable speech output.

🚀 Parler-TTS Mini v0.1

Parler-TTS Mini v0.1 is a lightweight text-to-speech (TTS) model. Trained on 10.5K hours of audio data, it can generate high-quality, natural - sounding speech. Its features can be controlled using a simple text prompt (e.g., gender, background noise, speaking rate, pitch, and reverberation). It is the first release model from the Parler-TTS project, aiming to provide the community with TTS training resources and dataset pre - processing code.

Fine - tuning guide on Colab:

🚀 Quick Start

Using Parler-TTS is as simple as "bonjour". First, install the library:

pip install git+https://github.com/huggingface/parler-tts.git

Then, you can use the model with the following inference snippet:

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")

prompt = "Hey, how are you doing today?"
description = "A female speaker with a slightly low - pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

💡 Usage Tip

Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise.

Punctuation can be used to control the prosody of the generations, e.g., use commas to add small breaks in speech.

The remaining speech features (gender, speaking rate, pitch, and reverberation) can be controlled directly through the prompt.

✨ Features

Trained on 10.5K hours of audio data.
Can generate high - quality, natural - sounding speech.
Allows control of speech features (gender, background noise, speaking rate, pitch, and reverberation) using a simple text prompt.

📦 Installation

pip install git+https://github.com/huggingface/parler-tts.git

💻 Usage Examples

Basic Usage

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")

prompt = "Hey, how are you doing today?"
description = "A female speaker with a slightly low - pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

📚 Documentation

Motivation

Parler-TTS is a reproduction of work from the paper [Natural language guidance of high - fidelity text - to - speech with synthetic annotations](https://www.text - description - to - speech.com) by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

Contrarily to other TTS models, Parler-TTS is a fully open - source release. All of the datasets, pre - processing, training code, and weights are released publicly under a permissive license, enabling the community to build on our work and develop their own powerful TTS models. Parler-TTS was released alongside:

The Parler - TTS repository - You can train and fine - tune your own version of the model.
The Data - Speech repository - A suite of utility scripts designed to annotate speech datasets.
The Parler - TTS organization - Where you can find the annotated datasets as well as the future checkpoints.

Citation

If you found this repository useful, please consider citing this work and also the original Stability AI paper:

@misc{lacombe-etal-2024-parler-tts,
  author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
  title = {Parler-TTS},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/parler-tts}}
}

@misc{lyth2024natural,
      title={Natural language guidance of high - fidelity text - to - speech with synthetic annotations},
      author={Dan Lyth and Simon King},
      year={2024},
      eprint={2402.01912},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

📄 License

This model is permissively licensed under the Apache 2.0 license.

📦 Datasets

Property	Details
Training Data	parler - tts/mls_eng_10k, blabble - io/libritts_r, parler - tts/libritts_r_tags_tagged_10k_generated, parler - tts/mls - eng - 10k - tags_tagged_10k_generated

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご