A Japanese speech synthesis model based on the Orpheus-TTS architecture, achieving efficient inference by reducing 43% of layers while maintaining high-quality speech generation
Model Features
Efficient Inference
Reduced from 28 layers to 16 layers (43% reduction), significantly lowering memory requirements and improving inference speed
Multi-Voice Support
Offers 14 different voice tones, including 8 high-quality voices (⭐⭐⭐ rating)
Japanese Optimization
Specifically trained and optimized for Japanese speech characteristics
Real-Time Processing
Inherits the low-latency characteristics of the original Orpheus model, suitable for streaming processing
Model Capabilities
Japanese Text-to-Speech
Multi-Voice Speech Synthesis
Real-Time Speech Generation
Use Cases
Entertainment Applications
Game Character Voice Acting
Generates real-time voices for Japanese game characters
Provides 14 different character voice options
Audio Content Creation
Automatically generates Japanese podcast or audiobook content
Supports switching between different narrator voices
Assistive Technology
Voice Assistants
Provides natural speech output for Japanese voice assistants
Low latency suitable for interactive scenarios
🚀 Slim-Orpheus 3B Japanese
Slim-Orpheus 3B Japanese is a text - to - speech model. It prunes the original weights to speed up inference and reduce memory requirements, and is trained on 14 Japanese voices.
✨ Features
Pruned original weights down from 28 -> 16 layers (43% reduction) to speed up inference and reduce memory requirements.
Trained in Japanese on 14 voices.
💻 Voices
Below are sample outputs for each voice with quality indicators:
⭐⭐⭐ Good quality
⭐⭐ Okay quality
⭐ Poor quality
⚠️ Unstable
Lyney ⭐⭐⭐
Cyno ⭐⭐⭐
Tighnari ⭐⭐⭐
Kaeya ⭐⭐⭐
Neuvillette ⭐⭐⭐
Kaveh ⭐⭐⭐
Dehya ⭐⭐⭐
Yae Miko ⭐⭐⭐
Layla ⭐⭐
Yoimiya ⭐⭐
Alhaitham ⭐⭐
Zhongli ⭐⭐
Furina ⭐
Arataki Itto ⚠️
🔧 Technical Details
Limitations
Japanese Only: This model was trained specifically for Japanese language and cannot speak English or other languages
No Emote Support: Not trained on emotes/emotional cues like , , etc. that were available in the original model
Reduced Parameter Count: While offering faster inference, the reduction from 28 to 16 layers may impact some of the nuanced capabilities of the original Orpheus model
Voice Quality Varies: As noted in the voice quality ratings, some voices perform better than others
Orpheus - TTS Model Details
Code is available on GitHub: [CanopyAI/Orpheus - TTS](https://github.com/canopyai/Orpheus - TTS)
Orpheus TTS is a state - of - the - art, Llama - based Speech - LLM designed for high - quality, empathetic text - to - speech generation. This model has been finetuned to deliver human - level speech synthesis, achieving exceptional clarity, expressiveness, and real - time streaming performances.
Model Capabilities
Human - Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models
Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming
Check out the Orpheus Colab: ([link to Colab](https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k - pQomz3N?usp=sharing)) or GitHub ([link to GitHub](https://github.com/canopyai/Orpheus - TTS)) on how to run easy inference on our finetuned models.
📄 License
The model is under the apache - 2.0 license.
📚 Documentation
Model Misuse
Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.