🚀 Orpheus-3b-German-FT-Q8_0
Orpheus-3b-German-FT-Q8_0 is a quantised Text-to-Speech model, fine - tuned for natural and emotional speech synthesis. It offers high - quality output with efficient performance.
🚀 Quick Start
- Download this quantised model from [lex - au's Orpheus - FASTAPI collection](https://huggingface.co/collections/lex - au/orpheus - fastapi - 67e125ae03fc96dae0517707).
- Load the model in your preferred inference server and start the server.
- Clone the Orpheus - FastAPI repository:
git clone https://github.com/Lex - au/Orpheus - FastAPI.git
cd Orpheus - FastAPI
- Configure the FastAPI server to connect to your inference server by setting the
ORPHEUS_API_URL
environment variable.
- Follow the complete installation and setup instructions in the [repository README](https://github.com/Lex - au/Orpheus - FastAPI).
✨ Features
- Multiple Voices: 3 distinct voice options with different characteristics, including
Jana
(female, German, clear), Thomas
(male, German, authoritative), and Max
(male, German, energetic).
- Emotion Tags: Support for emotion tags like
<laugh>
, <chuckle>
, <sigh>
, <cough>
, <sniffle>
, <groan>
, <yawn>
, <gasp>
to add expressiveness to speech.
- CUDA Acceleration: Optimised for CUDA acceleration on RTX GPUs.
- High - Quality Audio: Produces high - quality 24kHz mono audio.
- Conversational Naturalness: Fine - tuned for conversational naturalness.
📦 Installation
This quantised model can be loaded into any of these LLM inference servers:
- GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation.
- LM Studio - Load the GGUF model and start the local server.
- llama.cpp server - Run with the appropriate model parameters.
- Any compatible OpenAI API - compatible server.
💻 Usage Examples
Basic Usage
This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.
📚 Documentation
Model Description
Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.
Technical Specifications
Property |
Details |
Model Type |
Specialised token - to - audio sequence model |
Parameters |
~3 billion |
Quantisation |
8 - bit (GGUF Q8_0 format) |
Audio Sample Rate |
24kHz |
Input |
Text with optional voice selection and emotion tags |
Output |
High - quality WAV audio |
Language |
German |
Hardware Requirements |
CUDA - compatible GPU (recommended: RTX series) |
Integration Method |
External LLM inference server + Orpheus - FastAPI frontend |
Limitations
- Best performance achieved on CUDA - compatible GPUs.
- Generation speed depends on GPU capability.
🔧 Technical Details
- Architecture: Specialised token - to - audio sequence model.
- The model is quantised to 8 - bit (GGUF Q8_0 format) to balance efficiency and quality, enabling it to run on consumer hardware.
📄 License
This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).
📚 Citation & Attribution
The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.
If you use this quantised model in your research or applications, please cite:
@misc{orpheus-tts-2025,
author = {Canopy Labs},
title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}
@misc{orpheus-quantised-2025,
author = {Lex-au},
title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}