🚀 Orpheus-3b-Chinese-FT-Q8_0
Orpheus-3b-Chinese-FT-Q8_0 is a quantized Text-to-Speech model. It can convert text into natural-sounding speech, supporting multiple voices and emotional expressions. This version is optimized for efficiency while maintaining high-quality output.
🚀 Quick Start
- Download this quantized model from lex-au's Orpheus-FASTAPI collection.
- Load the model in your preferred inference server and start the server.
- Clone the Orpheus-FastAPI repository:
git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI
- Configure the FastAPI server to connect to your inference server by setting the
ORPHEUS_API_URL
environment variable.
- Follow the complete installation and setup instructions in the repository README.
✨ Features
- 2 distinct voice options with different characteristics.
- Support for emotion tags like laughter, sighs, etc.
- Optimized for CUDA acceleration on RTX GPUs.
- Produces high-quality 24kHz mono audio.
- Fine-tuned for conversational naturalness.
📦 Installation
Compatible Inference Servers
This quantized model can be loaded into any of these LLM inference servers:
- GPUStack - GPU optimized LLM inference server (My pick) - supports LAN/WAN tensor split parallelization.
- LM Studio - Load the GGUF model and start the local server.
- llama.cpp server - Run with the appropriate model parameters.
- Any compatible OpenAI API-compatible server.
💻 Usage Examples
Basic Usage
This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints.
Available Voices
The model supports 2 different voices:
长乐
: Female, Mandarin, gentle.
白芷
: Female, Mandarin, clear.
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>
, <chuckle>
: For laughter sounds.
<sigh>
: For sighing sounds.
<cough>
, <sniffle>
: For subtle interruptions.
<groan>
, <yawn>
, <gasp>
: For additional emotional expression.
📚 Documentation
Model Description
Orpheus-3b-FT-Q8_0 is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantized to 8-bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
🔧 Technical Details
Property |
Details |
Model Type |
Specialised token-to-audio sequence model |
Parameters |
~3 billion |
Quantisation |
8-bit (GGUF Q8_0 format) |
Audio Sample Rate |
24kHz |
Input |
Text with optional voice selection and emotion tags |
Output |
High-quality WAV audio |
Language |
Mandarin |
Hardware Requirements |
CUDA-compatible GPU (recommended: RTX series) |
Integration Method |
External LLM inference server + Orpheus-FastAPI frontend |
⚠️ Limitations
- Best performance achieved on CUDA-compatible GPUs.
- Generation speed depends on GPU capability.
📄 License
This model is available under the Apache License 2.0.
📚 Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantized version optimized for use with the Orpheus-FastAPI server.
If you use this quantized model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}