Shona_TTS Open-source Text-to-Speech Model - Free implementation of natural voice conversion from Shona text

Shona TTS

Developed by Fastino06

This is a Shona text-to-speech model fine-tuned based on SpeechT5, capable of converting Shona text into natural speech.

Speech Synthesis

Transformers

#Shona TTS #SpeechT5 fine-tuning #African language synthesis

Downloads 56

Release Time : 6/3/2024

Model Overview

This model is specifically designed for Shona (sna) text-to-speech tasks, developed based on the SpeechT5 architecture, supporting the conversion of input Shona text into high-quality speech waveforms.

Model Features

Shona Language Support

Speech synthesis capability specifically optimized for Shona

Based on SpeechT5

Fine-tuned using the advanced SpeechT5 architecture

Ease of Use

Provides simple API interfaces for easy integration and use

Model Capabilities

Shona text-to-speech

Speech waveform generation

Use Cases

Education

Language Learning Aid

Provides pronunciation reference for Shona learners

Helps learners master correct Shona pronunciation

Assistive Technology

Assistance for Visually Impaired

Converts text content into speech output

Helps visually impaired individuals access information

🚀 Shona Text-to-Speech

This repository offers a text-to-speech (TTS) model checkpoint for the Shona (sna) language, facilitating the conversion of text into natural-sounding speech.

🚀 Quick Start

To start using the Shona Text-to-Speech model, first install the necessary libraries:

pip install --upgrade transformers accelerate

Then, you can run inference with the following Python code:

# Load model directly
from transformers import AutoTokenizer, AutoModelForTextToWaveform

tokenizer = AutoTokenizer.from_pretrained("Fastino06/ff")
model = AutoModelForTextToWaveform.from_pretrained("Fastino06/ff")

text = "some example text in the Shona language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

The resulting waveform can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("fassy.wav", rate=model.config.sampling_rate, data=output)

Or displayed in a Jupyter Notebook / Google Colab:

from IPython.display import Audio

Audio(output, rate=model.config.sampling_rate)

✨ Features

Language Support: Specifically designed for the Shona language, enabling high - quality text - to - speech conversion.
Model Architecture: Based on the SpeechT5 model, fine - tuned for optimal performance in Shona TTS.

📦 Installation

pip install --upgrade transformers accelerate

💻 Usage Examples

Basic Usage

# Load model directly
from transformers import AutoTokenizer, AutoModelForTextToWaveform

tokenizer = AutoTokenizer.from_pretrained("Fastino06/ff")
model = AutoModelForTextToWaveform.from_pretrained("Fastino06/ff")

text = "some example text in the Shona language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

Saving the Output

import scipy

scipy.io.wavfile.write("fassy.wav", rate=model.config.sampling_rate, data=output)

Displaying in Notebook

from IPython.display import Audio

Audio(output, rate=model.config.sampling_rate)

📚 Documentation

Model Details

Property	Details
Developed by	Fastino Mateteva
Model Type	Text to Speech
Language(s) (NLP)	Shona
Finetuned from model	SpeechT5

📄 License

This model is licensed under the CC - BY - NC - 4.0 license.

BibTex citation

This model was developed by Fastino Mateteva

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご