A multilingual text-to-speech model fine-tuned based on Orpheus-3b, optimized for African low-resource languages, supporting voice cloning and emotion synthesis
This is a 16-bit quantized and merged, memory-optimized version of Orpheus, fine-tuned using Unsloth and LoRA technology, designed for expressive multilingual text-to-speech, especially suitable for African low-resource languages.
Model Features
African Language Optimization
Specifically optimized for African low-resource languages such as Igbo, Yoruba, and Hausa
Voice Cloning
Supports personalized voice cloning, capable of mimicking specific speaker characteristics
Emotion Synthesis
Can generate speech with emotional features such as laughter and sighs
Efficient Inference
Optimized with 4-bit quantization and LoRA technology, low memory usage, and high inference efficiency
Model Capabilities
Multilingual text-to-speech
Voice cloning
Emotional speech synthesis
Low-resource language support
Use Cases
Education
African Language Learning Aid
Provides pronunciation examples for learners of African languages
Generates natural and fluent speech in Igbo, Yoruba, etc.
Accessibility Technology
African Language Screen Reader
Provides text-to-speech services in African languages for visually impaired individuals
Supports voice output in multiple African languages
Media Production
Localized Content Dubbing
Provides localized dubbing for media content in African regions
Generates speech with local accents and cultural characteristics
🚀 Hypa_Orpheus-3b-0.1-ft (merged 16-bit)
A memory-efficient, 16-bit quantized and merged fine-tuned version of the Orpheus model, optimized for expressive multilingual text-to-speech, especially in low-resource African languages.
This model is a fine-tuned variant of canopylabs/orpheus-3b-0.1-ft, optimized with Unsloth and LoRA. It's designed for expressive multilingual TTS, with a focus on low-resource African languages. Key capabilities include:
Text-to-Speech generation
Speech synthesis for under-represented accents
Voice cloning & emotion synthesis
Research on multilingual low-resource voice AI
✨ Features
Model Details
Model Summary
This model was trained on a parallel text-speech dataset that includes over 300 hours (75k samples) of Nigerian-accented and low-resource language audio (Igbo, Yoruba, Hausa). A significant portion of the dataset comes from AfroVoices' transcription of real-world YouTube data (Random speaker, ~100+ hrs).
To maintain and enhance multilingual capabilities while preventing catastrophic forgetting, synthetic speech-text data sampled from the original 8 Orpheus voices using default emotional prompts was included.
The final training set also incorporated new speakers:
Eniola (40 hrs) – Female, bold, clear
Moyo (40 hrs) – Female, professional, articulate
Lovelyn (35 hrs) – Female, warm, shy
Precious (30 hrs) – Female, friendly, gentle
This model achieves state-of-the-art performance on low-resource Multilingual TTS tasks across African languages (see training stats below).
Base Model Details
The default Orpheus-TTS model released by Canopy Labs supports the following voices and emotions:
Voices: tara, leah, jess, leo, dan, mia, zac, and zoe.
Emotions: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, and <gasp>.
Through synthetic data generation and addition, our finetuned model also retains these voices and emotions. For more information on voices and emotions, please visit the default model's card.
Our Model Sample Generations
🎧 Listen to samples generated by Hypa Orpheus-TTS
Text Input
Audio Output
Language
Voice
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.
English
Emmanuel
Ina dafa abinci don bakin gobe kuma ina bukatar sanin yadda ake yin ailoli. Za ka iya ba ni girke-gireken matakan daya bayan daya?
Hausa
Emmanuel
Ina dafa abinci don bakin gobe kuma ina bukatar sanin yadda ake yin ailoli. Za ka iya ba ni girke-gireken matakan daya bayan daya?
Hausa
Eniola
Èmi máa se oúnjẹ fún àwọn àlejò l'ọ́la mo sì nílò láti mọ bí wọn ti ńṣe aioli. Ṣe o lè fún mi ni àwọn ìlànà ìdáná ẹlẹ́sẹẹsẹ?
Yoruba
Eniola
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.
English
Eniola
M na-esi nri maka ndị ọbịa echi ma achọ ịmata otú esi esi aioli. Ị nwere ike inye m usoro ntụziaka?
Igbo
Eniola
M na-esi nri maka ndị ọbịa echi ma achọ ịmata otú esi esi aioli. Ị nwere ike inye m usoro ntụziaka?
Igbo
Lovelyn
I am cooking for guests tomorrow and need to know how to make aioli. Can you give me a step-by-step recipe.
English
Lovelyn
📦 Training Details
Training Summary
Base model: canopylabs/orpheus-3b-0.1-ft
Training engine: Unsloth + LoRA
LoRA config: r=1024, alpha=1024, dropout=0.0, full attention + FFN adaptation
Quantization: 4-bit (bnb) for training; final model is highly memory-efficient
✅ Parallel text-audio pairs for African-English, Igbo, Yoruba, Hausa
Total Hours: 300+ (multi-accent)
Key Speakers: 45+ unique voices (see speaker distribution chart below)
We plan to open-source the full dataset shortly similar to the Hypa_Fleurs initiative.
📄 Licensing and Citation
This model is released under an Open Source License (apache-2.0). Please refer to the LICENSE file for full details.
When using this model in your work, please cite both this model as well as the base canopylabs/orpheus-3b-0.1-ft model as follows:
@misc{canopylabsorpheus,
title={Orpheus-3b-0.1-ft: A Multilingual Text-to-Speech Model},
author={Canopy Labs},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}},
note={Fine-tuned version of Orpheus for expressive TTS}
}
@misc{hypaorpheus4bit,
title={Hypa_Orpheus-3b-0.1-ft (LoRA-4bit)},
author={Hypa AI},
year={2025},
note={Fine-tuned Orpheus TTS on African languages},
url={https://huggingface.co/hypaai/Hypa_Orpheus-3b-0.1-ft-unsloth-bnb-4bit}
}
👏 Acknowledgements
Canopy Labs Team: For creating the foundational model and opensourcing it.
AfroVoices Experts: For their translation expertise and high-quality datasets.
Community Support: We thank all supporters, contributors, and users.
📞 Contact and Contributions
For any questions, issues, or contributions, please open an issue in this repository or contact hypa.ai.ng@gmail.com. Contributions are welcome!
🌟 Closing Remarks
By making Hypa_Orpheus available, we hope to empower research and development in multilingual speech technologies for African languages.
Hypa AI remains steadfast in its mission to pioneer intelligent solutions that are not just technologically advanced but are also culturally aware, ensuring that the future of AI is as diverse and inclusive as the world it serves.
AfroVoices, a subsidiary of Hypa AI, is dedicated to amplifying African voices, languages, and cultures in the intelligence age. Focused on bridging the digital representation gap, AfroVoices curates datasets and resources for African languages, promoting inclusivity and cultural appreciation in AI technologies. Their mission goes beyond technological innovation, aiming to celebrate the richness of African linguistic diversity on a global stage.
💻 Usage
Unsloth Inference
Download the needed packages.
%%capture
import os
if"COLAB_"notin"".join(os.environ.keys()):
!pip install unsloth
else:
# Do this only in Colab notebooks! Otherwise use pip install unsloth
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth
!pip install snac
Download the models (both the SNAC encoder/decoder as well as our finetuned Hypa_Orpheus).