Orpheus Open-source Text-to-Speech Model - Free Deployment for Natural and Emotive Speech Synthesis

Orpheus 3b FT Q8 0.gguf

Developed by lex-au

Orpheus is a high-performance text-to-speech model fine-tuned to achieve natural and emotionally rich speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing efficiency while maintaining high-quality output.

Speech Synthesis Supports Multiple LanguagesOpen Source License:Apache-2.0 #8-bit quantized TTS #Multi-voice synthesis #Emotional speech generation

Downloads 4,351

Release Time : 3/21/2025

Model Overview

Orpheus-3b-FT-Q8_0 is a 3-billion-parameter text-to-speech model that converts text input into natural speech, supporting multiple voice tones and emotional expressions. The model has been quantized to 8-bit (Q8_0) format for efficient inference, enabling it to run on consumer-grade hardware.

Model Features

Multi-voice support

Offers 8 selectable voice tones with different characteristics, including male/female voices and various emotional expressions

Emotional tags

Supports inserting emotional tags like laughter and sighs to enhance speech expressiveness

Efficient inference

The 8-bit quantized version optimizes efficiency for running on consumer-grade hardware

High-quality audio

Generates 24kHz mono high-quality audio

CUDA acceleration

Optimized for CUDA acceleration on RTX graphics cards

Model Capabilities

Text-to-speech

Multi-voice speech synthesis

Emotional speech generation

24kHz audio output

Use Cases

Voice interaction

Smart assistants

Provides natural voice output for smart assistants

Generates emotionally rich conversational speech

Audiobooks

Converts text content into audiobooks

Supports different character voices and emotional expressions

Media production

Video narration

Generates professional narration for video content

Offers multiple professional voice options

Game voice acting

Generates dynamic voices for game characters

Supports emotional tags to enhance expressiveness

🚀 Orpheus-3b-FT-Q8_0

This is a high - performance quantised Text - to - Speech model, fine - tuned for natural, emotional speech synthesis, offering efficient inference on consumer hardware.

🚀 Quick Start

Step 1: Download the Model

Download this quantised model from lex - au's Orpheus - FASTAPI collection.

Step 2: Load the Model

Load the model in your preferred inference server and start the server. Compatible inference servers include:

GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
LM Studio - Load the GGUF model and start the local server
llama.cpp server - Run with the appropriate model parameters
Any compatible OpenAI API - compatible server

Step 3: Clone the Repository

git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI

Step 4: Configure the Server

Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.

Step 5: Follow the Instructions

Follow the complete installation and setup instructions in the repository README.

✨ Features

Multiple Voice Options: 8 distinct voice options with different characteristics.
Emotion Support: Support for emotion tags like laughter, sighs, etc.
GPU Acceleration: Optimised for CUDA acceleration on RTX GPUs.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Conversational Naturalness: Fine - tuned for conversational naturalness.

💻 Usage Examples

Basic Usage

This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.

Advanced Usage

You can use different voices and emotion tags to add expressiveness to the speech. For example, insert <laugh> to add laughter sounds.

📚 Documentation

Model Description

Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.

Available Voices

The model supports 8 different voices:

tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing

Emotion Tags

You can add expressiveness to speech by inserting tags:

<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression

Audio Samples

Listen to the model in action with different voices and emotions:

Default Voice Sample

Leah (Happy)

Tara (Sad)

Zac (Contemplative)

🔧 Technical Details

Property	Details
Model Type	Specialised token - to - audio sequence model
Parameters	~3 billion
Quantisation	8 - bit (GGUF Q8_0 format)
Audio Sample Rate	24kHz
Input	Text with optional voice selection and emotion tags
Output	High - quality WAV audio
Language	English
Hardware Requirements	CUDA - compatible GPU (recommended: RTX series)
Integration Method	External LLM inference server + Orpheus - FastAPI frontend

Limitations

⚠️ Important Note

Currently supports English text only.

Best performance achieved on CUDA - compatible GPUs.

Generation speed depends on GPU capability.

📄 License

This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).

Citation & Attribution

The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.

If you use this quantised model in your research or applications, please cite:

@misc{orpheus-tts-2025,
  author = {Canopy Labs},
  title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}

@misc{orpheus-quantised-2025,
  author = {Lex-au},
  title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
  note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご