Adia_TTS Open-Source Speech Synthesis Model - Free and Fast Wolof Speech Synthesis

Adia TTS

Developed by CONCREE

ADIA_TTS is an open-source Wolof speech synthesis model developed by CONCREE, based on the parler-tts-mini-multilingual-v1.1 model, achieving significant progress in Wolof speech synthesis.

Speech Synthesis

Transformers

OtherOpen Source License:Apache-2.0 #Wolof speech synthesis #Multi-style voice control #Education scenario optimization

Downloads 912

Release Time : 1/31/2025

Model Overview

ADIA_TTS is a text-to-speech model focused on the Wolof language, capable of generating natural and fluent speech while controlling voice characteristics through descriptions.

Model Features

Multilingual Support

Based on the parler-tts-mini-multilingual-v1.1 model, it supports multiple languages including Wolof.

High-Quality Speech Synthesis

Generates natural and fluent speech suitable for various application scenarios.

Voice Style Control

Controls voice characteristics through descriptions, such as clear, professional, or educational tones.

Efficient Training

Trained on 40 hours of Wolof speech data and fine-tuned over 100 epochs.

Model Capabilities

Wolof text-to-speech

Voice style control

High-quality speech generation

Use Cases

Education

Language Learning

Used for speech synthesis in Wolof learning materials to help learners improve listening comprehension.

Generates clear and educational speech suitable for learning.

Professional Applications

Formal Speeches

Generates professional, clear, and composed speech suitable for formal occasions.

High-quality speech suitable for formal settings.

Daily Applications

Natural Conversations

Generates warm and natural speech suitable for daily conversations and interactions.

Fluid speech close to natural conversation.

🚀 Adia_TTS Wolof

Adia_TTS is an open - source Wolof text - to - speech model developed by CONCREE, based on the parler - tts - mini - multilingual - v1.1 model, which represents a significant advancement in Wolof text - to - speech synthesis.

🚀 Quick Start

ADIA_TTS is an open - source Wolof text - to - speech (TTS) model developed by CONCREE. Based on the parler - tts - mini - multilingual - v1.1 model, it marks a significant step forward in TTS for the Wolof language.

✨ Features

Trained on 40 hours of Wolof vocal data.
Fine - tuned for 100 epochs (~168 hours of training).
Natural and fluent vocal quality.
Single voice with control over vocal characteristics via description.

📦 Installation

Prerequisites

Python 3.8 or higher
PyTorch 2.0 or higher
CUDA (required for GPU acceleration)

pip install git+https://github.com/huggingface/parler-tts.git

💻 Usage Examples

Basic Usage

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Loading the model
model = ParlerTTSForConditionalGeneration.from_pretrained("CONCREE/Adia_TTS").to(device)
tokenizer = AutoTokenizer.from_pretrained("CONCREE/Adia_TTS")

# Wolof text to synthesize
text = "Entreprenariat ci Senegal dafa am solo lool ci yokkuteg koom-koom, di gëna yokk liggéey ak indi gis-gis yu bees ci dëkk bi."

# Vocal style description
description = "A clear and educational voice, with a flow adapted to learning"

# Generation
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_ids = tokenizer(text, return_tensors="pt").input_ids.to(device)

audio = model.generate(
    input_ids=input_ids,
    prompt_input_ids=prompt_ids,
)

# Saving
sf.write("output.wav", audio.cpu().numpy().squeeze(), model.config.sampling_rate)

Advanced Usage

generation_config = {
    "temperature": 0.8,           # Controls the variability of the output
    "max_new_tokens": 1000,       # Maximum length of the generated sequence
    "do_sample": True,            # Enables random sampling
    "top_k": 50,                  # Limits the number of considered tokens
    "repetition_penalty": 1.2,    # Penalizes token repetition
}

audio = model.generate(
    input_ids=input_ids,
    prompt_input_ids=prompt_ids,
    **generation_config
)

Different Vocal Styles

Natural Voice

description = "A warm and natural voice, with a conversational flow"

Professional Voice

description = "A professional, clear and composed voice, perfect for formal presentations"

Educational Voice

description = "A clear and educational voice, with a flow adapted to learning"

📚 Documentation

Technical Specifications

Property	Details
Model Type	parler - tts - mini - multilingual - v1.1
Model Size	1.88 GB
Model Format	PyTorch
Sampling Frequency	24kHz
Audio Encoding	16 - bit PCM

Performance

Property	Details
Average Inference Time	seconds/sentence (CPU), 20 seconds/sentence (GPU)
Memory Consumption	3.9 GB (Recommended minimum RAM)

Limitations

Reduced performance on very long sentences.
Limited handling of numbers and dates.
Relatively longer initial model loading time.
The model is limited to a maximum of 200 characters per inference without segmentation. Manual segmentation is required for longer texts.
The quality of transitions between segments may vary depending on the chosen segmentation method.
It is recommended to segment the text at natural boundaries (sentences, paragraphs) for better results.

References

@misc{CONCREE-2024-Adia_TTS,
  author = {CONCREE},
  title = {Adia_TTS},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/CONCREE/Adia_TTS}}
}

@misc{lyth2024natural,
  title={Natural language guidance of high - fidelity text - to - speech with synthetic annotations},
  author={Dan Lyth and Simon King},
  year={2024},
  eprint={2402.01912},
  archivePrefix={arXiv},
  primaryClass={cs.SD}
}

📄 License

This project is licensed under the Apache 2.0 license. See the LICENSE file for more details.

Usage Conditions

Users are committed to using the model in a way that respects the Wolof language and Senegalese culture.
We encourage the use of this model to develop solutions that improve digital accessibility for Wolof speakers and contribute to reducing the digital divide. Projects aiming at digital inclusion are particularly welcome.
Any use of the model must mention CONCREE as the original creator. Users are strongly encouraged to share their improvements with the community.
Commercial use is permitted under the terms of the Apache 2.0 license.

Contact

For any questions or support:

Email: ai@concree.com

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご