Orpheus-3b-Korean-FT-Q8_0.gguf Open-source Speech Model - Achieve Natural Emotional Speech Synthesis for Korean Text

Orpheus 3b Korean FT Q8 0.gguf

Developed by lex-au

Orpheus is a high-performance Korean text-to-speech model, fine-tuned for natural emotional speech synthesis, offering an 8-bit quantized version for optimized efficiency.

Speech Synthesis Supports Multiple LanguagesOpen Source License:Apache-2.0 #Korean speech synthesis #Emotional speech generation #8-bit quantization

Downloads 29

Release Time : 4/18/2025

Model Overview

A 3-billion parameter text-to-speech model supporting multiple voices and emotional expressions, generating 24kHz high-quality audio, fine-tuned for conversational naturalness.

Model Features

8-bit quantization

The model is quantized to 8-bit (Q8_0) format, optimizing inference efficiency while maintaining high-quality output

Multi-voice support

Offers 2 distinctive voice options (female 'Yuna' and male 'Junseo')

Emotional expression

Supports adding emotional expressions like laughter and sighs through tags, enhancing speech naturalness

High-performance inference

Optimized for CUDA acceleration on RTX GPUs, suitable for consumer-grade hardware

Model Capabilities

Korean speech synthesis

Emotional speech generation

Multi-voice conversion

24kHz audio output

Use Cases

Voice interaction applications

Virtual assistant

Provides natural speech output for Korean virtual assistants

Generates high-quality response speech with emotional variations

Audio content creation

Automatically generates Korean audio content with emotional expressions

Voice output with controllable voice types and emotional tags

Assistive technology

Screen reader

Provides more natural Korean speech feedback for visually impaired users

More human-like speech expression compared to traditional TTS

🚀 Orpheus-3b-FT-Q8_0

Orpheus-3b-FT-Q8_0 is a quantized Text-to-Speech model that efficiently converts text into high - quality natural speech, supporting multiple voices and emotional expressions.

🚀 Quick Start

Download the Model

Download this quantised model from lex - au's Orpheus - FASTAPI collection.

Load the Model

Load the model in your preferred inference server and start the server. The compatible inference servers include:

GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
LM Studio - Load the GGUF model and start the local server
llama.cpp server - Run with the appropriate model parameters
Any compatible OpenAI API - compatible server

Set up the Front - end

Clone the Orpheus - FastAPI repository:

git clone https://github.com/Lex-au/Orpheus-FastAPI.git
cd Orpheus-FastAPI

Configure the FastAPI server to connect to your inference server by setting the ORPHEUS_API_URL environment variable.
Follow the complete installation and setup instructions in the repository README.

✨ Features

Multiple Voice Options: Offers 2 distinct voice options with different characteristics.
Emotion Tag Support: Supports emotion tags like laughter, sighs, etc., to add expressiveness to speech.
CUDA Acceleration: Optimised for CUDA acceleration on RTX GPUs.
High - Quality Audio: Produces high - quality 24kHz mono audio.
Conversational Naturalness: Fine - tuned for conversational naturalness.

📦 Installation

This model is designed to be used with an LLM inference server that connects to the [Orpheus - FastAPI](https://github.com/Lex - au/Orpheus - FastAPI) frontend. You can load the quantised model into any of the following LLM inference servers:

GPUStack
LM Studio
llama.cpp server
Any compatible OpenAI API - compatible server

💻 Usage Examples

Available Voices

The model supports 2 different voices:

유나: Female, Korean, melodic
준서: Male, Korean, confident

Emotion Tags

You can add expressiveness to speech by inserting tags:

<laugh>, <chuckle>: For laughter sounds
<sigh>: For sighing sounds
<cough>, <sniffle>: For subtle interruptions
<groan>, <yawn>, <gasp>: For additional emotional expression

📚 Documentation

Model Description

Orpheus - 3b - FT - Q8_0 is a 3 billion parameter Text - to - Speech model. It converts text inputs into natural - sounding speech, supporting multiple voices and emotional expressions. The model has been quantised to 8 - bit (Q8_0) format for efficient inference, making it accessible on consumer hardware.

🔧 Technical Details

Property	Details
Model Type	Specialised token - to - audio sequence model
Parameters	~3 billion
Quantisation	8 - bit (GGUF Q8_0 format)
Audio Sample Rate	24kHz
Input	Text with optional voice selection and emotion tags
Output	High - quality WAV audio
Language	Korean
Hardware Requirements	CUDA - compatible GPU (recommended: RTX series)
Integration Method	External LLM inference server + Orpheus - FastAPI frontend

🚫 Limitations

Best performance achieved on CUDA - compatible GPUs.
Generation speed depends on GPU capability.

📄 License

This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE - 2.0).

Citation & Attribution

The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus - FastAPI server.

If you use this quantised model in your research or applications, please cite:

@misc{orpheus-tts-2025,
  author = {Canopy Labs},
  title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
}

@misc{orpheus-quantised-2025,
  author = {Lex-au},
  title = {Orpheus-3b-FT-Q8_0: Quantised TTS Model with FastAPI Server},
  note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q8_0.gguf}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご