๐ IndicF5: High-Quality Text-to-Speech for Indian Languages
IndicF5 is a near - human polyglot Text - to - Speech (TTS) model that offers high - quality speech synthesis for multiple Indian languages.
Datasets
- ai4bharat/indicvoices_r
- ai4bharat/Rasa
Supported Languages
- as (Assamese)
- bn (Bengali)
- gu (Gujarati)
- mr (Marathi)
- hi (Hindi)
- kn (Kannada)
- ml (Malayalam)
- or (Odia)
- pa (Punjabi)
- ta (Tamil)
- te (Telugu)
Pipeline Tag
text - to - speech
We release IndicF5, a near - human polyglot Text - to - Speech (TTS) model trained on 1417 hours of high - quality speech from Rasa, IndicTTS, LIMMITS, and IndicVoices - R.
IndicF5 supports 11 Indian languages:
Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu.
๐ Quick Start
๐ฆ Installation
conda create -n indicf5 python=3.10 -y
conda activate indicf5
pip install git+https://github.com/ai4bharat/IndicF5.git
๐ป Usage Examples
Basic Usage
To generate speech, you need to provide three inputs:
- Text to synthesize โ The content you want the model to speak.
- A reference prompt audio โ An example speech clip that guides the modelโs prosody and speaker characteristics.
- Text spoken in the reference prompt audio โ The transcript of the reference prompt audio.
from transformers import AutoModel
import numpy as np
import soundfile as sf
repo_id = "ai4bharat/IndicF5"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
audio = model(
"เคจเคฎเคธเฅเคคเฅ! เคธเคเคเฅเคค เคเฅ เคคเคฐเคน เคเฅเคตเคจ เคญเฅ เคเฅเคฌเคธเฅเคฐเคค เคนเฅเคคเคพ เคนเฅ, เคฌเคธ เคเคธเฅ เคธเคนเฅ เคคเคพเคฒ เคฎเฅเค เคเฅเคจเคพ เคเคจเคพ เคเคพเคนเคฟเค.",
ref_audio_path="prompts/PAN_F_HAPPY_00001.wav",
ref_text="เจญเจนเฉฐเจชเฉ เจตเจฟเฉฑเจ เจธเจฎเจพเจฐเจเจพเจ เจฆเฉ เจญเจตเจจ เจจเจฟเจฐเจฎเจพเจฃ เจเจฒเจพ เจฆเฉ เจตเฉเจฐเจตเฉ เจเฉเฉฐเจเจฒเจฆเจพเจฐ เจ
เจคเฉ เจนเฉเจฐเจพเจจ เจเจฐเจจ เจตเจพเจฒเฉ เจนเจจ, เจเฉ เจฎเฉเจจเฉเฉฐ เจเฉเจธเจผ เจเจฐเจฆเฉ เจนเจจเฅค"
)
if audio.dtype == np.int16:
audio = audio.astype(np.float32) / 32768.0
sf.write("namaste.wav", np.array(audio, dtype=np.float32), samplerate=24000)
print("Audio saved succesfully.")
You can find example prompt audios used here.
๐ Documentation
Terms of Use
โ ๏ธ Important Note
By using this model, you agree to only clone voices for which you have explicit permission. Unauthorized voice cloning is strictly prohibited. Any misuse of this model is the responsibility of the user.
References
We would like to extend our gratitude to the authors of [F5 - TTS](https://github.com/SWivid/F5 - TTS) for their invaluable contributions and inspiration to this work. Their efforts have played a crucial role in advancing the field of text - to - speech synthesis.
๐ Citation
If you use IndicF5 in your research or projects, please consider citing it:
๐น BibTeX
@misc{AI4Bharat_IndicF5_2025,
author = {Praveen S V and Srija Anand and Soma Siddhartha and Mitesh M. Khapra},
title = {IndicF5: High - Quality Text - to - Speech for Indian Languages},
year = {2025},
url = {https://github.com/AI4Bharat/IndicF5},
}