Orpheus 3B French Text-to-Speech Open-Source Model - Efficiently Generate Natural and Emotional Speech Synthesis Effects

Orpheus 3b French FT Q8 0.gguf

Developed by lex-au

Orpheus is a high-performance text-to-speech model, fine-tuned specifically for natural emotional speech synthesis. This repository hosts the 8-bit quantized version of the 3-billion-parameter model, optimizing efficiency while maintaining high-quality output.

Speech Synthesis Supports Multiple LanguagesOpen Source License:Apache-2.0 #French TTS #Emotional Speech Synthesis #8-bit Quantization

Downloads 101

Release Time : 4/18/2025

Model Overview

Orpheus is a high-performance text-to-speech model that converts text input into natural speech, supporting multiple voice tones and emotional expressions. The model has been quantized to an 8-bit (Q8_0) format for efficient inference, enabling it to run on consumer-grade hardware.

Model Features

Multi-Voice Support

Supports 3 distinct voice options: Pierre (male voice), Amelie (female voice), and Marie (female voice)

Emotional Expression

Supports emotional tags such as laughter and sighs to enhance speech expressiveness

Efficient Inference

8-bit quantization optimizes efficiency, enabling operation on consumer-grade hardware

High-Quality Audio

Generates 24kHz mono high-quality audio

Model Capabilities

Text-to-Speech

Emotional Speech Synthesis

Multi-Voice Speech Generation

Use Cases

Speech Synthesis

Audiobook Generation

Convert French text into natural speech for audiobook production

Generates high-quality speech with emotional expression

Voice Assistants

Provide natural speech output for French voice assistants

Supports multiple voice tones and emotional expressions

🚀 Orpheus-3b-FT-Q8_0

Orpheus-3b-FT-Q8_0 is a high - performance Text - to - Speech model. It's a quantised version of canopylabs/3b-fr-ft-research_release, optimised for efficient inference on consumer hardware while maintaining high - quality speech output.

🚀 Quick Start

Download the Model

Download this quantised model from lex-au's Orpheus-FASTAPI collection.

Load the Model

Load the model in your preferred inference server and start the server.

Set up the Front - end

Clone the Orpheus - FastAPI repository:

git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI

Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.
Follow the complete installation and setup instructions in the repository README.

✨ Features

Multiple Voice Options: 3 distinct voice options with different characteristics.
Emotion Support: Support for emotion tags like laughter, sighs, etc.
GPU Acceleration: Optimised for CUDA acceleration on RTX GPUs.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Naturalness: Fine - tuned for conversational naturalness.

📦 Installation

Compatible Inference Servers

This quantised model can be loaded into any of these LLM inference servers:

GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation.
LM Studio - Load the GGUF model and start the local server.
llama.cpp server - Run with the appropriate model parameters.
Any compatible OpenAI API - compatible server.

💻 Usage Examples

Available Voices

The model supports 3 different voices:

Pierre: Male, French, sophisticated.
Amelie: Female, French, elegant.
Marie: Female, French, spirited.

Emotion Tags

You can add expressiveness to speech by inserting tags:

<laugh>, <chuckle>: For laughter sounds.
<sigh>: For sighing sounds.
<cough>, <sniffle>: For subtle interruptions.
<groan>, <yawn>, <gasp>: For additional emotional expression.

📚 Documentation

Model Description

Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model that converts text inputs into natural - sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.

Technical Specifications

Property	Details
Model Type	Specialised token - to - audio sequence model
Parameters	~3 billion
Quantisation	8 - bit (GGUF Q8_0 format)
Audio Sample Rate	24kHz
Input	Text with optional voice selection and emotion tags
Output	High - quality WAV audio
Language	French
Hardware Requirements	CUDA - compatible GPU (recommended: RTX series)
Integration Method	External LLM inference server + Orpheus - FastAPI frontend

Limitations

Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.

📄 License

This model is available under the Apache License 2.0.

🔧 Technical Details

Citation & Attribution

The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.

If you use this quantised model in your research or applications, please cite:

@misc{orpheus-tts-2025,
  author = {Canopy Labs},
  title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}

@misc{orpheus-quantised-2025,
  author = {Lex-au},
  title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
  note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご