๐ Parler-TTS Mini v1 - Jenny
Parler-TTS Mini v1 - Jenny is a fine - tuned text - to - speech model. It's based on the 30 - hours single - speaker high - quality Jenny dataset, suitable for TTS model training and has similar usage to Parler - TTS v1.
- Fine - tuning guide on Colab:

๐ Quick Start
This is a fine - tuned version of Parler - TTS Mini v1 on the [30 - hours single - speaker high - quality Jenny (she's Irish โ๏ธ) dataset](https://github.com/dioco - group/jenny - tts - dataset). Its usage is more or less the same as Parler - TTS v1. You just need to specify the keyword โJennyโ in the voice description.
๐ฆ Installation
pip install git+https://github.com/huggingface/parler-tts.git
๐ป Usage Examples
Basic Usage
You can use the model with the following inference snippet:
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-mini-v1-jenny").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-mini-v1-jenny")
prompt = "Hey, how are you doing today? My name is Jenny, and I'm here to help you with any questions you have."
description = "Jenny speaks at an average pace with an animated delivery in a very confined sounding environment with clear audio quality."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
๐ Documentation
Datasets
- ylacombe/jenny - tts - tagged - v1
- reach - vb/jenny_tts_dataset
Tags
- text - to - speech
- annotation
Pipeline Tag
text - to - speech
Inference
false
Library Name
transformers
๐ Citation
If you found this repository useful, please consider citing this work and also the original Stability AI paper:
@misc{lacombe-etal-2024-parler-tts,
author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
title = {Parler-TTS},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/parler-tts}}
}
@misc{lyth2024natural,
title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
author={Dan Lyth and Simon King},
year={2024},
eprint={2402.01912},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
๐ License
Attribution is required in software/websites/projects/interfaces (including voice interfaces) that generate audio in response to user action using this dataset. Attribution means: the voice must be referred to as "Jenny", and where at all practical, "Jenny (Dioco)". Attribution is not required when distributing the generated clips (although welcome). Commercial use is permitted. Don't do unfair things like claim the dataset is your own. No further restrictions apply.