Orpheus is a high-performance text-to-speech model, specifically fine-tuned for natural emotional speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing efficiency while maintaining high-quality output.
Orpheus-3b-FT-Q2_K is a 3-billion-parameter text-to-speech model that converts text input into natural speech, supporting multiple voices and emotional expressions. The model has been quantized to 8-bit (Q2_K) format for efficient inference, enabling it to run on consumer-grade hardware.
Model Features
Multi-voice Support
Offers 8 distinctive voice options to meet various scenario needs
Emotional Expression
Supports emotional tags like laughter and sighs to enhance speech naturalness
Efficient Inference
8-bit quantized version optimizes efficiency for operation on consumer-grade hardware
High-Quality Output
Generates 24kHz mono high-quality audio
Model Capabilities
Text-to-Speech
Emotional Speech Synthesis
Multi-voice Selection
Audio Generation
Use Cases
Voice Interaction
Virtual Assistants
Provides natural speech output for virtual assistants
Enhances user experience, making voice interactions more natural
Audiobooks
Converts text content into audiobooks
Offers multiple voice options suitable for different content styles
Entertainment Applications
Game Voiceovers
Generates dynamic voices for game characters
Supports emotional tags to enhance game immersion
🚀 Orpheus-3b-FT-Q2_K
Orpheus-3b-FT-Q2_K is a quantized Text-to-Speech model, offering efficient and high - quality speech synthesis with support for multiple voices and emotional expressions.
✨ Features
Multiple Voice Options: It provides 8 distinct voice options with different characteristics.
Emotion Tag Support: Supports emotion tags like laughter, sighs, etc., adding expressiveness to speech.
CUDA Acceleration: Optimized for CUDA acceleration on RTX GPUs, enabling efficient inference.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Conversational Naturalness: Fine - tuned for conversational naturalness.
📦 Installation
This quantized model can be loaded into any of these LLM inference servers:
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.
Quick Start
Download this quantized model from [lex - au's Orpheus - FASTAPI collection](https://huggingface.co/collections/lex - au/orpheus - fastapi - 67e125ae03fc96dae0517707).
Load the model in your preferred inference server and start the server.
Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.
Follow the complete installation and setup instructions in the [repository README](https://github.com/Lex - au/Orpheus - FastAPI).
Audio Samples
You can listen to the model in action with different voices and emotions:
Default Voice Sample
Leah (Happy)
Tara (Sad)
Zac (Contemplative)
Available Voices
The model supports 8 different voices:
tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression
🔧 Technical Details
Property
Details
Architecture
Specialised token - to - audio sequence model
Parameters
~3 billion
Quantisation
8 - bit (GGUF Q2_K format)
Audio Sample Rate
24kHz
Input
Text with optional voice selection and emotion tags
Output
High - quality WAV audio
Language
English
Hardware Requirements
CUDA - compatible GPU (recommended: RTX series)
Integration Method
External LLM inference server + Orpheus - FastAPI frontend
📚 Documentation
Limitations
Currently supports English text only.
Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.
License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantized version optimised for use with the Orpheus - FastAPI server.
If you use this quantized model in your research or applications, please cite:
@misc{orpheus - tts - 2025,
author = {Canopy Labs},
title = {Orpheus - 3b - 0.1 - ft: Text - to - Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus - 3b - 0.1 - ft}}
}
@misc{orpheus - quantised - 2025,
author = {Lex - au},
title = {Orpheus - 3b - FT - Q2_K: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus - 3b - 0.1 - ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex - au/Orpheus - 3b - FT - Q4_K_M.gguf}}
}