Orpheus Open-Source Text-to-Speech Model - Free Deployment for Natural Emotional Speech Synthesis

Orpheus 3b FT Q2 K.gguf

Developed by lex-au

Orpheus is a high-performance text-to-speech model, specifically fine-tuned for natural emotional speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing efficiency while maintaining high-quality output.

Speech Synthesis Supports Multiple LanguagesOpen Source License:Apache-2.0 #Emotional Speech Synthesis #8-bit Quantized TTS #Multi-voice Support

Downloads 287

Release Time : 3/24/2025

Model Overview

Orpheus-3b-FT-Q2_K is a 3-billion-parameter text-to-speech model that converts text input into natural speech, supporting multiple voices and emotional expressions. The model has been quantized to 8-bit (Q2_K) format for efficient inference, enabling it to run on consumer-grade hardware.

Model Features

Multi-voice Support

Offers 8 distinctive voice options to meet various scenario needs

Emotional Expression

Supports emotional tags like laughter and sighs to enhance speech naturalness

Efficient Inference

8-bit quantized version optimizes efficiency for operation on consumer-grade hardware

High-Quality Output

Generates 24kHz mono high-quality audio

Model Capabilities

Text-to-Speech

Emotional Speech Synthesis

Multi-voice Selection

Audio Generation

Use Cases

Voice Interaction

Virtual Assistants

Provides natural speech output for virtual assistants

Enhances user experience, making voice interactions more natural

Audiobooks

Converts text content into audiobooks

Offers multiple voice options suitable for different content styles

Entertainment Applications

Game Voiceovers

Generates dynamic voices for game characters

Supports emotional tags to enhance game immersion

🚀 Orpheus-3b-FT-Q2_K

Orpheus-3b-FT-Q2_K is a quantized Text-to-Speech model, offering efficient and high - quality speech synthesis with support for multiple voices and emotional expressions.

✨ Features

Multiple Voice Options: It provides 8 distinct voice options with different characteristics.
Emotion Tag Support: Supports emotion tags like laughter, sighs, etc., adding expressiveness to speech.
CUDA Acceleration: Optimized for CUDA acceleration on RTX GPUs, enabling efficient inference.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Conversational Naturalness: Fine - tuned for conversational naturalness.

📦 Installation

This quantized model can be loaded into any of these LLM inference servers:

GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
LM Studio - Load the GGUF model and start the local server
llama.cpp server - Run with the appropriate model parameters
Any compatible OpenAI API - compatible server

💻 Usage Examples

Basic Usage

This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend, which provides both a web UI and OpenAI - compatible API endpoints.

Quick Start

Download this quantized model from [lex - au's Orpheus - FASTAPI collection](https://huggingface.co/collections/lex - au/orpheus - fastapi - 67e125ae03fc96dae0517707).
Load the model in your preferred inference server and start the server.
Clone the Orpheus - FastAPI repository:

git clone https://github.com/Lex - au/Orpheus - FastAPI.git
cd Orpheus - FastAPI

Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.
Follow the complete installation and setup instructions in the [repository README](https://github.com/Lex - au/Orpheus - FastAPI).

Audio Samples

You can listen to the model in action with different voices and emotions:

Default Voice Sample

Leah (Happy)

Tara (Sad)

Zac (Contemplative)

Available Voices

The model supports 8 different voices:

tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing

Emotion Tags

You can add expressiveness to speech by inserting tags:

<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression

🔧 Technical Details

Property	Details
Architecture	Specialised token - to - audio sequence model
Parameters	~3 billion
Quantisation	8 - bit (GGUF Q2_K format)
Audio Sample Rate	24kHz
Input	Text with optional voice selection and emotion tags
Output	High - quality WAV audio
Language	English
Hardware Requirements	CUDA - compatible GPU (recommended: RTX series)
Integration Method	External LLM inference server + Orpheus - FastAPI frontend

📚 Documentation

Limitations

Currently supports English text only.
Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.

License

This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).

Citation & Attribution

The original Orpheus model was created by Canopy Labs. This repository contains a quantized version optimised for use with the Orpheus - FastAPI server.

If you use this quantized model in your research or applications, please cite:

@misc{orpheus - tts - 2025,
  author = {Canopy Labs},
  title = {Orpheus - 3b - 0.1 - ft: Text - to - Speech Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/canopylabs/orpheus - 3b - 0.1 - ft}}
}

@misc{orpheus - quantised - 2025,
  author = {Lex - au},
  title = {Orpheus - 3b - FT - Q2_K: Quantised TTS Model with FastAPI Server},
  note = {GGUF quantisation of canopylabs/orpheus - 3b - 0.1 - ft},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lex - au/Orpheus - 3b - FT - Q4_K_M.gguf}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご