Orpheus is a high-performance text-to-speech model fine-tuned to achieve natural and emotionally rich speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing efficiency while maintaining high-quality output.
Orpheus-3b-FT-Q8_0 is a 3-billion-parameter text-to-speech model that converts text input into natural speech, supporting multiple voice tones and emotional expressions. The model has been quantized to 8-bit (Q8_0) format for efficient inference, enabling it to run on consumer-grade hardware.
Model Features
Multi-voice support
Offers 8 selectable voice tones with different characteristics, including male/female voices and various emotional expressions
Emotional tags
Supports inserting emotional tags like laughter and sighs to enhance speech expressiveness
Efficient inference
The 8-bit quantized version optimizes efficiency for running on consumer-grade hardware
High-quality audio
Generates 24kHz mono high-quality audio
CUDA acceleration
Optimized for CUDA acceleration on RTX graphics cards
Model Capabilities
Text-to-speech
Multi-voice speech synthesis
Emotional speech generation
24kHz audio output
Use Cases
Voice interaction
Smart assistants
Provides natural voice output for smart assistants
Generates emotionally rich conversational speech
Audiobooks
Converts text content into audiobooks
Supports different character voices and emotional expressions
Media production
Video narration
Generates professional narration for video content
Offers multiple professional voice options
Game voice acting
Generates dynamic voices for game characters
Supports emotional tags to enhance expressiveness
🚀 Orpheus-3b-FT-Q8_0
This is a high - performance quantised Text - to - Speech model, fine - tuned for natural, emotional speech synthesis, offering efficient inference on consumer hardware.
git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI
Step 4: Configure the Server
Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.
Step 5: Follow the Instructions
Follow the complete installation and setup instructions in the repository README.
✨ Features
Multiple Voice Options: 8 distinct voice options with different characteristics.
Emotion Support: Support for emotion tags like laughter, sighs, etc.
GPU Acceleration: Optimised for CUDA acceleration on RTX GPUs.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Conversational Naturalness: Fine - tuned for conversational naturalness.
💻 Usage Examples
Basic Usage
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.
Advanced Usage
You can use different voices and emotion tags to add expressiveness to the speech. For example, insert <laugh> to add laughter sounds.
📚 Documentation
Model Description
Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
Available Voices
The model supports 8 different voices:
tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression
Audio Samples
Listen to the model in action with different voices and emotions:
Default Voice Sample
Leah (Happy)
Tara (Sad)
Zac (Contemplative)
🔧 Technical Details
Property
Details
Model Type
Specialised token - to - audio sequence model
Parameters
~3 billion
Quantisation
8 - bit (GGUF Q8_0 format)
Audio Sample Rate
24kHz
Input
Text with optional voice selection and emotion tags
Output
High - quality WAV audio
Language
English
Hardware Requirements
CUDA - compatible GPU (recommended: RTX series)
Integration Method
External LLM inference server + Orpheus - FastAPI frontend
Limitations
⚠️ Important Note
Currently supports English text only.
Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.
📄 License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}