Orpheus is a high-performance text-to-speech model, fine-tuned to achieve natural and emotionally rich speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing operational efficiency while maintaining high-quality output.
A 3-billion-parameter text-to-speech model that converts text input into natural speech, supporting multiple voice tones and emotional expressions. The model has been quantized to 8-bit (Q4_K_M) format to enhance inference efficiency, making it suitable for consumer-grade hardware.
Model Features
Multi-Voice Support
Offers 8 distinctive voice options to meet various scenario needs
Emotional Expression
Supports emotional tags such as laughter and sighs to enhance speech expressiveness
Efficient Inference
8-bit quantized (Q4_K_M) format optimizes operational efficiency for consumer-grade hardware
High-Quality Audio
Generates 24kHz mono high-quality audio
Conversation Optimization
Fine-tuned for natural conversational flow
Model Capabilities
Text-to-Speech
Multi-Voice Speech Synthesis
Emotional Speech Generation
High-Quality Audio Output
Use Cases
Speech Synthesis
Audiobook Generation
Generates natural speech for e-books using different voice tones
24kHz high-quality audio output
Virtual Assistant
Provides emotionally rich speech interaction capabilities for virtual assistants
Supports emotional expressions like laughter and sighs
Game Character Voiceovers
Generates dynamic voiceovers for game characters
8 selectable voice tones to meet diverse character needs
🚀 Orpheus-3b-FT-Q4_K_M
Orpheus-3b-FT-Q4_K_M is a quantised Text-to-Speech model, offering high - performance and natural emotional speech synthesis. It's optimised for efficiency and can run on consumer hardware.
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.
Advanced Usage
You can add expressiveness to speech by inserting emotion tags:
<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression
Audio Samples
Listen to the model in action with different voices and emotions:
Default Voice Sample
Leah (Happy)
Tara (Sad)
Zac (Contemplative)
Available Voices
The model supports 8 different voices:
tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing
📚 Documentation
Model Description
Orpheus - 3b - FT - Q4_K_M is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q4_K_M) format for efficient inference, making it accessible on consumer hardware.
🔧 Technical Details
Architecture: Specialised token - to - audio sequence model
Parameters: ~3 billion
Quantisation: 8 - bit (GGUF Q4_K_M format)
Audio Sample Rate: 24kHz
Input: Text with optional voice selection and emotion tags
Output: High - quality WAV audio
Language: English
Hardware Requirements: CUDA - compatible GPU (recommended: RTX series)
Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.
📄 License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q4_K_M: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf}}
}