🚀 Orpheus-3b-FT-Q8_0
Orpheus-3b-FT-Q8_0 is a quantized Text-to-Speech model that efficiently converts text into high - quality natural speech, supporting multiple voices and emotional expressions.
🚀 Quick Start
Download the Model
- Download this quantised model from lex - au's Orpheus - FASTAPI collection.
Load the Model
- Load the model in your preferred inference server and start the server. The compatible inference servers include:
- GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
- LM Studio - Load the GGUF model and start the local server
- llama.cpp server - Run with the appropriate model parameters
- Any compatible OpenAI API - compatible server
Set up the Front - end
- Clone the Orpheus - FastAPI repository:
git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI
-
Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL
environment variable.
-
Follow the complete installation and setup instructions in the repository README.
✨ Features
- Multiple Voice Options: Offers 2 distinct voice options with different characteristics.
- Emotion Tag Support: Supports emotion tags like laughter, sighs, etc., to add expressiveness to speech.
- CUDA Acceleration: Optimised for CUDA acceleration on RTX GPUs.
- High - Quality Audio: Produces high - quality 24kHz mono audio.
- Conversational Naturalness: Fine - tuned for conversational naturalness.
📦 Installation
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend. You can load the quantised model into any of the following LLM inference servers:
💻 Usage Examples
Available Voices
The model supports 2 different voices:
유나
: Female, Korean, melodic
준서
: Male, Korean, confident
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>
, <chuckle>
: For laughter sounds
<sigh>
: For sighing sounds
<cough>
, <sniffle>
: For subtle interruptions
<groan>
, <yawn>
, <gasp>
: For additional emotional expression
📚 Documentation
Model Description
Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model. It converts text inputs into natural - sounding speech, supporting multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
🔧 Technical Details
Property |
Details |
Model Type |
Specialised token - to - audio sequence model |
Parameters |
~3 billion |
Quantisation |
8 - bit (GGUF Q8_0 format) |
Audio Sample Rate |
24kHz |
Input |
Text with optional voice selection and emotion tags |
Output |
High - quality WAV audio |
Language |
Korean |
Hardware Requirements |
CUDA - compatible GPU (recommended: RTX series) |
Integration Method |
External LLM inference server + Orpheus - FastAPI frontend |
🚫 Limitations
- Best performance achieved on CUDA - compatible GPUs.
- Generation speed depends on GPU capability.
📄 License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}