๐ Parler-TTS Mini v0.1 - Jenny
This is a fine-tuned version of Parler-TTS Mini v0.1 on the 30-hours single-speaker high-quality Jenny (she's Irish โ๏ธ) dataset, suitable for training a TTS model. Its usage is similar to Parler-TTS v0.1, just specify the keyword โJennyโ in the voice description.
๐ Quick Start
๐ฆ Installation
You can install the necessary library using the following command:
pip install git+https://github.com/huggingface/parler-tts.git
๐ป Usage Examples
Basic Usage
You can use the model with the following inference snippet:
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-jenny-30H").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-jenny-30H")
prompt = "Hey, how are you doing today? My name is Jenny, and I'm here to help you with any questions you have."
description = "Jenny speaks at an average pace with an animated delivery in a very confined sounding environment with clear audio quality."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
๐ Documentation
Fine-tuning Guide
- Fine-tuning guide on Colab:
Citation
If you found this repository useful, please consider citing this work and also the original Stability AI paper:
@misc{lacombe-etal-2024-parler-tts,
author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
title = {Parler-TTS},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/parler-tts}}
}
@misc{lyth2024natural,
title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
author={Dan Lyth and Simon King},
year={2024},
eprint={2402.01912},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
๐ License
Attribution is required in software/websites/projects/interfaces (including voice interfaces) that generate audio in response to user action using this dataset. Attribution means: the voice must be referred to as "Jenny", and where at all practical, "Jenny (Dioco)". Attribution is not required when distributing the generated clips (although welcome). Commercial use is permitted. Don't do unfair things like claim the dataset is your own. No further restrictions apply.
Additional Resources
Model Information
Property |
Details |
Library Name |
transformers |
Tags |
text-to-speech, annotation |
Language |
en |
Pipeline Tag |
text-to-speech |
Inference |
false |
Datasets |
ylacombe/jenny-tts-10k-tagged, reach-vb/jenny_tts_dataset |