Hypa_Orpheus-3b Open-Source Multilingual Text-to-Speech Model - Empowering African Languages with Support for Voice Cloning and Emotional Synthesis

Hypa Orpheus 3b 0.1 Ft Unsloth Merged 16bit

Developed by hypaai

A multilingual text-to-speech model fine-tuned based on Orpheus-3b, optimized for African low-resource languages, supporting voice cloning and emotion synthesis

Speech Synthesis

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #African Language TTS #Voice Cloning #Low-Resource Optimization

Downloads 47

Release Time : 4/21/2025

Model Overview

This is a 16-bit quantized and merged, memory-optimized version of Orpheus, fine-tuned using Unsloth and LoRA technology, designed for expressive multilingual text-to-speech, especially suitable for African low-resource languages.

Model Features

African Language Optimization

Specifically optimized for African low-resource languages such as Igbo, Yoruba, and Hausa

Voice Cloning

Supports personalized voice cloning, capable of mimicking specific speaker characteristics

Emotion Synthesis

Can generate speech with emotional features such as laughter and sighs

Efficient Inference

Optimized with 4-bit quantization and LoRA technology, low memory usage, and high inference efficiency

Model Capabilities

Multilingual text-to-speech

Voice cloning

Emotional speech synthesis

Low-resource language support

Use Cases

Education

African Language Learning Aid

Provides pronunciation examples for learners of African languages

Generates natural and fluent speech in Igbo, Yoruba, etc.

Accessibility Technology

African Language Screen Reader

Provides text-to-speech services in African languages for visually impaired individuals

Supports voice output in multiple African languages

Media Production

Localized Content Dubbing

Provides localized dubbing for media content in African regions

Generates speech with local accents and cultural characteristics

🚀 Hypa_Orpheus-3b-0.1-ft (merged 16-bit)

A memory-efficient, 16-bit quantized and merged fine-tuned version of the Orpheus model, optimized for expressive multilingual text-to-speech, especially in low-resource African languages.

This model is a fine-tuned variant of canopylabs/orpheus-3b-0.1-ft, optimized with Unsloth and LoRA. It's designed for expressive multilingual TTS, with a focus on low-resource African languages. Key capabilities include:

Text-to-Speech generation
Speech synthesis for under-represented accents
Voice cloning & emotion synthesis
Research on multilingual low-resource voice AI

✨ Features

Model Details

Model Summary

This model was trained on a parallel text-speech dataset that includes over 300 hours (75k samples) of Nigerian-accented and low-resource language audio (Igbo, Yoruba, Hausa). A significant portion of the dataset comes from AfroVoices' transcription of real-world YouTube data (Random speaker, ~100+ hrs). To maintain and enhance multilingual capabilities while preventing catastrophic forgetting, synthetic speech-text data sampled from the original 8 Orpheus voices using default emotional prompts was included. The final training set also incorporated new speakers:

Eniola (40 hrs) – Female, bold, clear
Moyo (40 hrs) – Female, professional, articulate
Lovelyn (35 hrs) – Female, warm, shy
Precious (30 hrs) – Female, friendly, gentle

This model achieves state-of-the-art performance on low-resource Multilingual TTS tasks across African languages (see training stats below).

Base Model Details

The default Orpheus-TTS model released by Canopy Labs supports the following voices and emotions:

Voices: tara, leah, jess, leo, dan, mia, zac, and zoe.

Emotions: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, and <gasp>.

Through synthetic data generation and addition, our finetuned model also retains these voices and emotions. For more information on voices and emotions, please visit the default model's card.

Our Model Sample Generations

🎧 Listen to samples generated by Hypa Orpheus-TTS

Text Input	Language	Voice
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.	English	Emmanuel
Ina dafa abinci don bakin gobe kuma ina bukatar sanin yadda ake yin ailoli. Za ka iya ba ni girke-gireken matakan daya bayan daya?	Hausa	Emmanuel
Ina dafa abinci don bakin gobe kuma ina bukatar sanin yadda ake yin ailoli. Za ka iya ba ni girke-gireken matakan daya bayan daya?	Hausa	Eniola
Èmi máa se oúnjẹ fún àwọn àlejò l'ọ́la mo sì nílò láti mọ bí wọn ti ńṣe aioli. Ṣe o lè fún mi ni àwọn ìlànà ìdáná ẹlẹ́sẹẹsẹ?	Yoruba	Eniola
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.	English	Eniola
M na-esi nri maka ndị ọbịa echi ma achọ ịmata otú esi esi aioli. Ị nwere ike inye m usoro ntụziaka?	Igbo	Eniola
M na-esi nri maka ndị ọbịa echi ma achọ ịmata otú esi esi aioli. Ị nwere ike inye m usoro ntụziaka?	Igbo	Lovelyn
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.	English	Lovelyn

📦 Training Details

Training Summary

Base model: canopylabs/orpheus-3b-0.1-ft
Training engine: Unsloth + LoRA
LoRA config: r=1024, alpha=1024, dropout=0.0, full attention + FFN adaptation
Quantization: 4-bit (bnb) for training; final model is highly memory-efficient
Total steps: 18,014 (1 epoch)
Batch size: 1 × 4 (grad accum)
GPU: A100 40GB (max 55% VRAM used)

Step	Training Loss	Validation Loss
5,000	3.9496	3.8790
10,000	3.8863	3.79497
15,000	3.8544	3.75323

Dataset Summary

Sources:
- ✅ Manually aligned YouTube transcriptions (aka Random)
- ✅ Synthetic voice generation from Orpheus TTS
- ✅ Parallel text-audio pairs for African-English, Igbo, Yoruba, Hausa
Total Hours: 300+ (multi-accent)
Key Speakers: 45+ unique voices (see speaker distribution chart below)

image/png

We plan to open-source the full dataset shortly similar to the Hypa_Fleurs initiative.

📄 Licensing and Citation

This model is released under an Open Source License (apache-2.0). Please refer to the LICENSE file for full details.

When using this model in your work, please cite both this model as well as the base canopylabs/orpheus-3b-0.1-ft model as follows:

@misc{canopylabsorpheus,
  title={Orpheus-3b-0.1-ft: A Multilingual Text-to-Speech Model},
  author={Canopy Labs},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}},
  note={Fine-tuned version of Orpheus for expressive TTS}
}

@misc{hypaorpheus4bit,
  title={Hypa_Orpheus-3b-0.1-ft (LoRA-4bit)},
  author={Hypa AI},
  year={2025},
  note={Fine-tuned Orpheus TTS on African languages},
  url={https://huggingface.co/hypaai/Hypa_Orpheus-3b-0.1-ft-unsloth-bnb-4bit}
}

👏 Acknowledgements

Canopy Labs Team: For creating the foundational model and opensourcing it.
AfroVoices Experts: For their translation expertise and high-quality datasets.
Community Support: We thank all supporters, contributors, and users.

📞 Contact and Contributions

For any questions, issues, or contributions, please open an issue in this repository or contact hypa.ai.ng@gmail.com. Contributions are welcome!

🌟 Closing Remarks

By making Hypa_Orpheus available, we hope to empower research and development in multilingual speech technologies for African languages.

Hypa AI remains steadfast in its mission to pioneer intelligent solutions that are not just technologically advanced but are also culturally aware, ensuring that the future of AI is as diverse and inclusive as the world it serves.

AfroVoices, a subsidiary of Hypa AI, is dedicated to amplifying African voices, languages, and cultures in the intelligence age. Focused on bridging the digital representation gap, AfroVoices curates datasets and resources for African languages, promoting inclusivity and cultural appreciation in AI technologies. Their mission goes beyond technological innovation, aiming to celebrate the richness of African linguistic diversity on a global stage.

💻 Usage

Unsloth Inference

Download the needed packages.

%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth
!pip install snac

Download the models (both the SNAC encoder/decoder as well as our finetuned Hypa_Orpheus).

import torch
from snac import SNAC
from unsloth

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご