orpheus_3b_0.1_GGUF Open-source Large Speech Model - Free to Achieve High-quality Text-to-Speech and Emotional Control

Orpheus 3b 0.1 GGUF

Developed by Prince-1

A high-quality text-to-speech model based on Llama architecture, supporting emotion control and real-time streaming

Speech Synthesis Supports Multiple LanguagesOpen Source License:Apache-2.0 #Zero-shot Voice Cloning #Emotion-Controllable Speech Synthesis #Real-time Streaming TTS

Downloads 423

Release Time : 4/23/2025

Model Overview

Orpheus TTS is a speech synthesis model based on Llama architecture, efficiently trained using the Unsloth framework and TRL library, capable of generating realistic speech with voice cloning functionality

Model Features

Realistic Speech Synthesis

Generates natural intonation, emotion, and rhythm, surpassing current state-of-the-art proprietary models

Zero-shot Voice Cloning

Clone specific voice characteristics without pre-training

Emotion and Tone Guidance

Control speech emotional characteristics through simple labels

Low-latency Streaming

Approximately 200ms streaming latency in real-time applications, reducible to 100ms with streaming input

Model Capabilities

High-quality Speech Synthesis

Voice Cloning

Emotional Speech Control

Real-time Streaming

Use Cases

Voice Interaction Applications

Virtual Assistants

Generate natural speech responses for virtual assistants

Achieves human-level voice interaction experience

Audio Content Creation

Automatically generate audiobook or podcast content

Significantly reduces content production costs

Assistive Technologies

Voice Assistive Devices

Provide high-quality voice output for visually impaired individuals

Enhances user experience with assistive devices

🚀 Uploaded Model

This is an uploaded Llama-based model that offers high-quality text-to-speech generation. It was finetuned by Prince-1 and is licensed under the Apache 2.0 license. The model is finetuned from unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit and was trained 2x faster with Unsloth and Huggingface's TRL library.

Model Information

Property	Details
Finetuned by	Prince-1
License	apache-2.0
Finetuned from model	unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit
Base Model	unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit
Tags	text-generation-inference, transformers, unsloth, llama, trl, tts, text-to-speech, gguf, llama-cpp-python
Library Name	transformers
Language	en
Datasets	MrDragonFox/Elise

Model Features

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

✨ Features

Model Capabilities

Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models.
Zero-Shot Voice Cloning: Clone voices without prior fine-tuning.
Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags.
Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming.

Model Sources

GitHub Repo: https://github.com/canopyai/Orpheus-TTS
Blog Post: https://canopylabs.ai/model-releases
Colab Inference Notebook: notebook link

📦 Installation

The model is converted into GGUF. You can choose the quantization_method as follows:

not_quantized: Recommended. Fast conversion. Slow inference, big files.
fast_quantized: Recommended. Fast conversion. OK inference, OK file size.
quantized: Recommended. Slow conversion. Fast inference, small files.
f32: Not recommended. Retains 100% accuracy, but super slow and memory hungry.
f16: Fastest conversion + retains 100% accuracy. Slow and memory hungry.
q8_0: Fast conversion. High resource use, but generally acceptable.
q4_k_m: Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m: Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.
q2_k: Uses Q4_K for attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.
q3_k_l: Uses Q5_K for attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K.
q3_k_m: Uses Q4_K for attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K.
q3_k_s: Uses Q3_K for all tensors.
q4_0: Original quant method, 4-bit.
q4_1: Higher accuracy than q4_0 but not as high as q5_0. However, has quicker inference than q5 models.
q4_k_s: Uses Q4_K for all tensors.
q4_k: Alias for q4_k_m.
q5_k: Alias for q5_k_m.
q5_0: Higher accuracy, higher resource usage and slower inference.
q5_1: Even higher accuracy, resource usage and slower inference.
q5_k_s: Uses Q5_K for all tensors.
q6_k: Uses Q8_K for all tensors.

💻 Usage Examples

Basic Usage

from llama-cpp-python import LLM

📄 License

This model is licensed under the Apache 2.0 license.

⚠️ Important Note

Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご