Kokoro-82M Open-Source Text-to-Speech Model - Lightweight Architecture, High Audio Quality, Fast and Low-cost!

Kokoro 82M

Developed by hexgrad

Kokoro is an open-source text-to-speech (TTS) model with 82 million parameters, renowned for its lightweight architecture and high audio quality, while also being fast and cost-effective.

Speech Synthesis EnglishOpen Source License:Apache-2.0 #Lightweight TTS #Multilingual voices #Low-cost deployment

Downloads 2.0M

Release Time : 12/26/2024

Model Overview

Kokoro is an Apache-licensed text-to-speech model capable of generating high-quality speech output, suitable for various scenarios from production environments to personal projects.

Model Features

Lightweight architecture

Despite its smaller parameter size, it delivers audio quality comparable to larger models.

Cost efficiency

Less than $1 per million characters of text input and under $0.06 per hour of audio output.

Multilingual support

Supports 8 languages and 54 voices, suitable for diverse application scenarios.

Open-source license

Licensed under Apache, allowing free deployment in commercial and personal projects.

Model Capabilities

Text-to-speech

Multilingual speech synthesis

Efficient audio generation

Use Cases

Commercial applications

Voice assistants

Provides high-quality speech output for commercial applications.

Efficient and low-cost speech synthesis solution.

Audiobooks

Generates natural and fluent audiobook content.

High-quality multilingual speech output.

Personal projects

Personal voice assistants

Offers customized speech output for personal projects.

Lightweight and easy-to-deploy solution.

🚀 Kokoro

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

🐈 GitHub: https://github.com/hexgrad/kokoro

🚀 Demo: https://hf.co/spaces/hexgrad/Kokoro-TTS

🚀 Quick Start

You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.

Basic Usage

!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    print(i, gs, ps)
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000)

Under the hood, kokoro uses misaki, a G2P library at https://github.com/hexgrad/misaki

✨ Features

Lightweight and High - Quality: Despite having a lightweight architecture with 82 million parameters, it offers comparable quality to larger models.
Fast and Cost - Efficient: It is significantly faster and more cost - efficient, making it suitable for various scenarios.
Apache - Licensed: With Apache - licensed weights, it can be freely deployed in production environments and personal projects.

📦 Installation

!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1

📚 Documentation

Releases

Model	Published	Training Data	Langs & Voices	SHA256
v1.0	2025 Jan 27	Few hundred hrs	8 & 54	`496dba11`
v0.19	2024 Dec 25	<100 hrs	1 & 10	`3b0c392f`

Training Costs	v0.19	v1.0	Total
in A100 80GB GPU hours	500	500	1000
average hourly rate	$0.80/h	$1.20/h	$1/h
in USD	$400	$600	$1000

Model Facts

Property	Details
Architecture	StyleTTS 2: https://arxiv.org/abs/2306.07691 ISTFTNet: https://arxiv.org/abs/2203.02395 Decoder only: no diffusion, no encoder release
Architected by	Li et al @ https://github.com/yl4579/StyleTTS2
Trained by	`@rzvzn` on Discord
Languages	Multiple
Model SHA256 Hash	`496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4`

Training Details

Data: Kokoro was trained exclusively on permissive/non - copyrighted audio data and IPA phoneme labels. Examples of permissive/non - copyrighted audio include:

Public domain audio
Audio licensed under Apache, MIT, etc
Synthetic audio^[1] generated by closed^[2] TTS models from large providers
[1] https://copyright.gov/ai/ai_policy_guidance.pdf
[2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio

Total Training Cost: About $1000 for 1000 hours of A100 80GB vRAM

Creative Commons Attribution

The following CC BY audio was part of the dataset used to train Kokoro v1.0.

Audio Data	Duration Used	License	Added to Training Set After
Koniwa `tnc`	<1h	CC BY 3.0	v0.19 / 22 Nov 2024
SIWIS	<11h	CC BY 4.0	v0.19 / 22 Nov 2024

Acknowledgements

🛠️ @yl4579 for architecting StyleTTS 2.
🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena.
📊 Thank you to everyone who contributed synthetic training data.
❤️ Special thanks to all compute sponsors.
👾 Discord server: https://discord.gg/QuGxSWBfQy
🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". It is also the name of an AI in the Terminator franchise.

📄 License

This is an Apache - 2.0 licensed model.

⚠️ Important Note

As of April 2025, the market rate of Kokoro served over API is under $1 per million characters of text input, or under $0.06 per hour of audio output. (On average, 1000 characters of input is about 1 minute of output.) Sources: ArtificialAnalysis/Replicate at 65 cents per M chars and DeepInfra at 80 cents per M chars.

This is an Apache - licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. We welcome the deployment of the model in real use cases.

⚠️ Caution

Fake websites like kokorottsai_com (snapshot: https://archive.ph/nRRnk) and kokorotts_net (snapshot: https://archive.ph/60opa) are likely scams masquerading under the banner of a popular model.

Any website containing "kokoro" in its root domain (e.g. kokorottsai_com, kokorotts_net) is NOT owned by and NOT affiliated with this model page or its author, and attempts to imply otherwise are red flags.

Releases
Usage
EVAL.md ↗️
SAMPLES.md ↗️
VOICES.md ↗️
Model Facts
Training Details
Creative Commons Attribution
Acknowledgements

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご