parler-tiny-v1-jenny Open-source TTS Model - Freely Convert English Text to Speech

Home

Parler Tiny V1 Jenny

Developed by parler-tts

Jenny TTS is a transformers-based text-to-speech model supporting English speech synthesis.

Speech Synthesis

Transformers

English#English speech synthesis #Annotation dataset driven #Personalized voice generation

Downloads 48

Release Time : 9/4/2024

Model Overview

This model specializes in English text-to-speech tasks, capable of converting input English text into natural speech output.

Model Features

English Speech Synthesis

Specializes in high-quality voice conversion for English texts

Transformers-based

Utilizes transformers architecture for text-to-speech conversion

Annotation Dataset Support

Trained using annotated datasets to potentially improve synthesis accuracy

Model Capabilities

English text-to-speech

Speech synthesis

Use Cases

Voice Assistants

Virtual Assistant Voice Generation

Generates natural speech responses for English virtual assistants

Enhances user experience

Audio Content Creation

E-book Narration

Converts English e-book text into speech

Creates audiobooks

🚀 Parler-TTS Tiny v1 - Jenny

This is a fine-tuned version of Parler-TTS Tiny v1 on a 30 - hours single - speaker high - quality Jenny dataset. It's suitable for training a TTS model and can be used similarly to Parler-TTS v1 by specifying the keyword “Jenny” in the voice description.

Fine - tuning guide on Colab:

🚀 Quick Start

📦 Installation

pip install git+https://github.com/huggingface/parler-tts.git

💻 Usage Examples

Basic Usage

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tiny-v1-jenny").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tiny-v1-jenny")

prompt = "Hey, how are you doing today? My name is Jenny, and I'm here to help you with any questions you have."
description = "Jenny speaks at an average pace with an animated delivery in a very confined sounding environment with clear audio quality."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

📚 Documentation

📄 Citation

If you found this repository useful, please consider citing this work and also the original Stability AI paper:

@misc{lacombe-etal-2024-parler-tts,
  author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
  title = {Parler-TTS},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/parler-tts}}
}

@misc{lyth2024natural,
      title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
      author={Dan Lyth and Simon King},
      year={2024},
      eprint={2402.01912},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

📄 License

License - Attribution is required in software/websites/projects/interfaces (including voice interfaces) that generate audio in response to user action using this dataset. Atribution means: the voice must be referred to as "Jenny", and where at all practical, "Jenny (Dioco)". Attribution is not required when distributing the generated clips (although welcome). Commercial use is permitted. Don't do unfair things like claim the dataset is your own. No further restrictions apply.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご