🚀 Shona Text-to-Speech
This repository offers a text-to-speech (TTS) model checkpoint for the Shona (sna) language, facilitating the conversion of text into natural-sounding speech.
🚀 Quick Start
To start using the Shona Text-to-Speech model, first install the necessary libraries:
pip install --upgrade transformers accelerate
Then, you can run inference with the following Python code:
from transformers import AutoTokenizer, AutoModelForTextToWaveform
tokenizer = AutoTokenizer.from_pretrained("Fastino06/ff")
model = AutoModelForTextToWaveform.from_pretrained("Fastino06/ff")
text = "some example text in the Shona language"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
The resulting waveform can be saved as a .wav
file:
import scipy
scipy.io.wavfile.write("fassy.wav", rate=model.config.sampling_rate, data=output)
Or displayed in a Jupyter Notebook / Google Colab:
from IPython.display import Audio
Audio(output, rate=model.config.sampling_rate)
✨ Features
- Language Support: Specifically designed for the Shona language, enabling high - quality text - to - speech conversion.
- Model Architecture: Based on the SpeechT5 model, fine - tuned for optimal performance in Shona TTS.
📦 Installation
pip install --upgrade transformers accelerate
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTextToWaveform
tokenizer = AutoTokenizer.from_pretrained("Fastino06/ff")
model = AutoModelForTextToWaveform.from_pretrained("Fastino06/ff")
text = "some example text in the Shona language"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
Saving the Output
import scipy
scipy.io.wavfile.write("fassy.wav", rate=model.config.sampling_rate, data=output)
Displaying in Notebook
from IPython.display import Audio
Audio(output, rate=model.config.sampling_rate)
📚 Documentation
Model Details
Property |
Details |
Developed by |
Fastino Mateteva |
Model Type |
Text to Speech |
Language(s) (NLP) |
Shona |
Finetuned from model |
SpeechT5 |
📄 License
This model is licensed under the CC - BY - NC - 4.0 license.
BibTex citation
This model was developed by Fastino Mateteva