🚀 Orpheus-3b-FT-Q8_0
Orpheus-3b-FT-Q8_0 is a quantized Text-to-Speech model. It efficiently converts text into high - quality natural speech, supporting multiple voices and emotional expressions.
✨ Features
- High - Performance TTS: Orpheus is a high - performance Text - to - Speech model fine - tuned for natural, emotional speech synthesis.
- Quantized for Efficiency: The 8 - bit quantized version of the 3B parameter model is optimized for efficiency while maintaining high - quality output.
- Multiple Voices and Emotions: Supports 1 distinct voice option with different characteristics and emotion tags like laughter, sighs, etc.
- Hardware Compatibility: Optimized for CUDA acceleration on RTX GPUs and can run on consumer hardware.
- High - Quality Audio: Produces high - quality 24kHz mono audio, fine - tuned for conversational naturalness.
📦 Installation
Compatible Inference Servers
This quantized model can be loaded into any of these LLM inference servers:
- GPUStack - GPU optimized LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
- LM Studio - Load the GGUF model and start the local server
- llama.cpp server - Run with the appropriate model parameters
- Any compatible OpenAI API - compatible server
Quick Start
- Download this quantized model from [lex - au's Orpheus - FASTAPI collection](https://huggingface.co/collections/lex - au/orpheus - fastapi - 67e125ae03fc96dae0517707).
- Load the model in your preferred inference server and start the server.
- Clone the Orpheus - FastAPI repository:
git clone https://github.com/Lex - au/Orpheus - FastAPI.git
cd Orpheus - FastAPI
- Configure the FastAPI server to connect to your inference server by setting the
ORPHEUS_API_URL
environment variable.
- Follow the complete installation and setup instructions in the [repository README](https://github.com/Lex - au/Orpheus - FastAPI).
💻 Usage Examples
Basic Usage
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.
Available Voices
The model supports 1 voice:
ऋतिका
: Female, Hindi, expressive
Emotion Tags
You can add expressiveness to speech by inserting tags:
<laugh>
, <chuckle>
: For laughter sounds
<sigh>
: For sighing sounds
<cough>
, <sniffle>
: For subtle interruptions
<groan>
, <yawn>
, <gasp>
: For additional emotional expression
📚 Documentation
Model Description
Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantized to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
🔧 Technical Details
Property |
Details |
Model Type |
Text - to - Speech model |
Parameters |
~3 billion |
Quantisation |
8 - bit (GGUF Q8_0 format) |
Audio Sample Rate |
24kHz |
Input |
Text with optional voice selection and emotion tags |
Output |
High - quality WAV audio |
Language |
Hindi |
Hardware Requirements |
CUDA - compatible GPU (recommended: RTX series) |
Integration Method |
External LLM inference server + Orpheus - FastAPI frontend |
📄 License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantized version optimized for use with the Orpheus - FastAPI server.
If you use this quantized model in your research or applications, please cite:
@misc{orpheus - tts - 2025,
author = {Canopy Labs},
title = {Orpheus - 3b - 0.1 - ft: Text - to - Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus - 3b - 0.1 - ft}}
}
@misc{orpheus - quantised - 2025,
author = {Lex - au},
title = {Orpheus - 3b - FT - Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus - 3b - 0.1 - ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex - au/Orpheus - 3b - FT - Q8_0.gguf}}
}
⚠️ Important Note
- Currently supports English text only.
- Best performance achieved on CUDA - compatible GPUs.
- Generation speed depends on GPU capability.