🚀 Orpheus Bangla GGUF
This is a proof - of - concept fine - tuned Orpheus 3B TTS model for Bengali language support.
🚀 Quick Start
This README provides detailed information about the Orpheus Bangla GGUF model, including its description, intended use, training details, limitations, and related training resources.
✨ Features
- Bengali Support: Specifically fine - tuned for the Bengali language, enabling text - to - speech conversion in Bengali.
- Based on Orpheus 3B: Built upon the Orpheus 3B TTS model.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model Description
This model is a proof - of - concept fine - tuned version of the Orpheus 3B TTS (Text - to - Speech) model for Bengali language support. The model has been trained using the SUST - CSE - Speech/banspeech
dataset, which contains 955 audio samples split from audiobooks. This fine - tuning was performed for 10 epochs on a single Google Colab instance equipped with a T4 GPU.
⚠️ Important Note
This model is currently in the proof - of - concept phase and is not recommended for production use.
Intended Use
This model can be used for generating Bengali speech from text. It is ideal for experimenting with TTS systems for Bengali, particularly for audiobooks, conversational AI, or speech synthesis tasks.
Model Training
Property |
Details |
Dataset |
SUST - CSE - Speech/banspeech (955 audiobook audio samples) |
Training Epochs |
10 epochs |
Hardware |
Google Colab (single T4 GPU) |
Training Script |
A modified Unsloth fine - tuning script was used for the training. The script is available on GitHub: [Orpheus TTS Training Script](https://github.com/asiff00/Training - TTS/blob/main/orpheus/orpheus.ipynb). |
Limitations
- This model was trained on a small dataset and for a limited number of epochs, which may lead to less natural or less accurate speech synthesis.
- Since this is a proof - of - concept model, the synthesis quality may vary based on input text and different conditions. It is not optimized for production environments.
Training Resources
- [TTS Training: Style - TTS2](https://github.com/asiff00/Training - TTS/tree/main/style - tts2)
- [TTS Training: VIT - TTS](https://github.com/asiff00/Training - TTS/tree/main/vit - tts)
- [On - Device Speech - to - Speech Conversational AI](https://github.com/asiff00/On - Device - Speech - to - Speech - Conversational - AI)
- [Bangla Llama](https://github.com/asiff00/Bangla - Llama)
- [Bangla RAG Pipeline, PoRAG](https://github.com/Bangla - RAG/PoRAG)
🔧 Technical Details
No specific technical details (more than 50 - word specific technical descriptions) are provided in the original document, so this section is skipped.
📄 License
The model is licensed under the apache - 2.0
license.