🚀 Orpheus-3b-FT-Q8_0
Orpheus-3b-FT-Q8_0 is a high - performance Text - to - Speech model. It's a quantised version of canopylabs/3b-fr-ft-research_release, optimised for efficient inference on consumer hardware while maintaining high - quality speech output.
🚀 Quick Start
Download the Model
- Download this quantised model from lex-au's Orpheus-FASTAPI collection.
Load the Model
- Load the model in your preferred inference server and start the server.
Set up the Front - end
- Clone the Orpheus - FastAPI repository:
git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI
- Configure the FastAPI server to connect to your inference server by setting the
ORPHEUS_API_URL
environment variable.
- Follow the complete installation and setup instructions in the repository README.
✨ Features
- Multiple Voice Options: 3 distinct voice options with different characteristics.
- Emotion Support: Support for emotion tags like laughter, sighs, etc.
- GPU Acceleration: Optimised for CUDA acceleration on RTX GPUs.
- High - Quality Audio: Produces high - quality 24kHz mono audio.
- Naturalness: Fine - tuned for conversational naturalness.
📦 Installation
Compatible Inference Servers
This quantised model can be loaded into any of these LLM inference servers:
- GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation.
- LM Studio - Load the GGUF model and start the local server.
- llama.cpp server - Run with the appropriate model parameters.
- Any compatible OpenAI API - compatible server.
💻 Usage Examples
Available Voices
The model supports 3 different voices:
Pierre
: Male, French, sophisticated.
Amelie
: Female, French, elegant.
Marie
: Female, French, spirited.
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>
, <chuckle>
: For laughter sounds.
<sigh>
: For sighing sounds.
<cough>
, <sniffle>
: For subtle interruptions.
<groan>
, <yawn>
, <gasp>
: For additional emotional expression.
📚 Documentation
Model Description
Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
Technical Specifications
Property |
Details |
Model Type |
Specialised token - to - audio sequence model |
Parameters |
~3 billion |
Quantisation |
8 - bit (GGUF Q8_0 format) |
Audio Sample Rate |
24kHz |
Input |
Text with optional voice selection and emotion tags |
Output |
High - quality WAV audio |
Language |
French |
Hardware Requirements |
CUDA - compatible GPU (recommended: RTX series) |
Integration Method |
External LLM inference server + Orpheus - FastAPI frontend |
Limitations
- Best performance achieved on CUDA - compatible GPUs.
- Generation speed depends on GPU capability.
📄 License
This model is available under the Apache License 2.0.
🔧 Technical Details
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}