🚀 Uploaded Model
This is an uploaded Llama-based model that offers high-quality text-to-speech generation. It was finetuned by Prince-1 and is licensed under the Apache 2.0 license. The model is finetuned from unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit and was trained 2x faster with Unsloth and Huggingface's TRL library.

Model Information
Property |
Details |
Finetuned by |
Prince-1 |
License |
apache-2.0 |
Finetuned from model |
unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit |
Base Model |
unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit |
Tags |
text-generation-inference, transformers, unsloth, llama, trl, tts, text-to-speech, gguf, llama-cpp-python |
Library Name |
transformers |
Language |
en |
Datasets |
MrDragonFox/Elise |
Model Features
Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.
✨ Features
Model Capabilities
- Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models.
- Zero-Shot Voice Cloning: Clone voices without prior fine-tuning.
- Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags.
- Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming.
Model Sources
📦 Installation
The model is converted into GGUF
. You can choose the quantization_method
as follows:
- not_quantized: Recommended. Fast conversion. Slow inference, big files.
- fast_quantized: Recommended. Fast conversion. OK inference, OK file size.
- quantized: Recommended. Slow conversion. Fast inference, small files.
- f32: Not recommended. Retains 100% accuracy, but super slow and memory hungry.
- f16: Fastest conversion + retains 100% accuracy. Slow and memory hungry.
- q8_0: Fast conversion. High resource use, but generally acceptable.
- q4_k_m: Recommended. Uses Q6_K for half of the
attention.wv
and feed_forward.w2
tensors, else Q4_K.
- q5_k_m: Recommended. Uses Q6_K for half of the
attention.wv
and feed_forward.w2
tensors, else Q5_K.
- q2_k: Uses Q4_K for
attention.vw
and feed_forward.w2
tensors, Q2_K for the other tensors.
- q3_k_l: Uses Q5_K for
attention.wv
, attention.wo
, and feed_forward.w2
tensors, else Q3_K.
- q3_k_m: Uses Q4_K for
attention.wv
, attention.wo
, and feed_forward.w2
tensors, else Q3_K.
- q3_k_s: Uses Q3_K for all tensors.
- q4_0: Original quant method, 4-bit.
- q4_1: Higher accuracy than q4_0 but not as high as q5_0. However, has quicker inference than q5 models.
- q4_k_s: Uses Q4_K for all tensors.
- q4_k: Alias for q4_k_m.
- q5_k: Alias for q5_k_m.
- q5_0: Higher accuracy, higher resource usage and slower inference.
- q5_1: Even higher accuracy, resource usage and slower inference.
- q5_k_s: Uses Q5_K for all tensors.
- q6_k: Uses Q8_K for all tensors.
💻 Usage Examples
Basic Usage
from llama-cpp-python import LLM
📄 License
This model is licensed under the Apache 2.0 license.
⚠️ Important Note
Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.