Kokoro-82M开源TTS模型 - 音质媲美大模型，速度快且成本低免费使用

首页

Kokoro 82M

由 prince-canuma 开发

Kokoro是一款拥有8200万参数的开源TTS模型，音质媲美更大模型，同时具备显著的速度优势和成本效益。

语音合成英语开源协议:Apache-2.0 #轻量级TTS #多语言语音合成 #高性价比语音生成

下载量 376

发布时间 : 2/26/2025

模型简介

Kokoro是一款轻量级文本转语音模型，基于StyleTTS2架构，支持多种语言和音色，适用于生产环境和个人项目。

模型特点

轻量高效

8200万参数的轻量架构，在保持高质量音质的同时具备快速推理能力

多语言支持

支持8种语言和54种音色，满足多样化需求

开源许可

采用Apache-2.0许可证，可自由用于商业和个人项目

低成本训练

仅需1000美元训练成本，使用A100 GPU完成训练

模型能力

高质量语音合成

多语言语音生成

音色切换

语速调节

使用案例

内容创作

有声读物生成

将文本内容转换为自然语音

生成高质量、富有表现力的语音

视频配音

为视频内容添加多语言配音

支持多种语言和音色的语音输出

辅助技术

语音辅助应用

为视障用户提供文本朗读功能

生成清晰自然的语音输出

🚀 Kokoro - 轻量级高效文本转语音模型

Kokoro是一个拥有8200万参数的开源权重文本转语音（TTS）模型。尽管架构轻量，但它能提供与大型模型相媲美的质量，同时速度更快、成本更低。其权重采用Apache许可证，可在从生产环境到个人项目的任何场景中部署。

⬆️ Kokoro已升级到v1.0版本！ 查看版本发布。

✨ 现在你可以通过pip install kokoro进行安装！查看使用方法。

版本发布
使用方法
SAMPLES.md ↗️
VOICES.md ↗️
模型信息
训练详情
知识共享署名说明
致谢

🚀 快速开始

你可以通过以下命令安装Kokoro推理库：

pip install kokoro

安装完成后，你可以参考下面的使用示例进行操作。

✨ 主要特性

轻量级架构：仅8200万参数，却能提供与大型模型相媲美的语音质量。
高效性能：速度更快，成本更低，适合各种场景部署。
多语言支持：支持美式英语、英式英语、法语、印地语等多种语言。
开源权重：采用Apache许可证，可自由用于生产环境和个人项目。

📦 安装指南

你可以使用pip命令安装Kokoro：

pip install kokoro

安装链接：pip install kokoro

💻 使用示例

基础用法

# 1️⃣ 安装kokoro
!pip install -q kokoro>=0.3.4 soundfile
# 2️⃣ 安装espeak，用于英语OOD回退和一些非英语语言
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
# 🇪🇸 'e' => 西班牙语 es
# 🇫🇷 'f' => 法语 fr-fr
# 🇮🇳 'h' => 印地语 hi
# 🇮🇹 'i' => 意大利语 it
# 🇧🇷 'p' => 巴西葡萄牙语 pt-br

# 3️⃣ 初始化一个管道
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
# 🇺🇸 'a' => 美式英语, 🇬🇧 'b' => 英式英语
# 🇯🇵 'j' => 日语: pip install misaki[ja]
# 🇨🇳 'z' => 普通话: pip install misaki[zh]
pipeline = KPipeline(lang_code='a') # <= 确保lang_code与语音匹配

# 此文本仅用于演示目的，训练期间未见过
text = '''
The sky above the port was the color of television, tuned to a dead channel.
"It's not like I'm using," Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. "It's like my body's developed this massive drug deficiency."
It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese.

These were to have an enormous impact, not only because they were associated with Constantine, but also because, as in so many other areas, the decisions taken by Constantine (or in his name) were to have great significance for centuries to come. One of the main issues was the shape that Christian churches were to take, since there was not, apparently, a tradition of monumental church buildings when Constantine decided to help the Christian church build a series of truly spectacular structures. The main form that these churches took was that of the basilica, a multipurpose rectangular structure, based ultimately on the earlier Greek stoa, which could be found in most of the great cities of the empire. Christianity, unlike classical polytheism, needed a large interior space for the celebration of its religious services, and the basilica aptly filled that need. We naturally do not know the degree to which the emperor was involved in the design of new churches, but it is tempting to connect this with the secular basilica that Constantine completed in the Roman forum (the so-called Basilica of Maxentius) and the one he probably built in Trier, in connection with his residence in the city at a time when he was still caesar.

[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
# text = '「もしおれがただ偶然、そしてこうしようというつもりでなくここに立っているのなら、ちょっとばかり絶望するところだな」と、そんなことが彼の頭に思い浮かんだ。'
# text = '中國人民不信邪也不怕邪，不惹事也不怕事，任何外國不要指望我們會拿自己的核心利益做交易，不要指望我們會吞下損害我國主權、安全、發展利益的苦果！'
# text = 'Los partidos políticos tradicionales compiten con los populismos y los movimientos asamblearios.'
# text = 'Le dromadaire resplendissant déambulait tranquillement dans les méandres en mastiquant de petites feuilles vernissées.'
# text = 'ट्रांसपोर्टरों की हड़ताल लगातार पांचवें दिन जारी, दिसंबर से इलेक्ट्रॉनिक टोल कलेक्शनल सिस्टम'
# text = "Allora cominciava l'insonnia, o un dormiveglia peggiore dell'insonnia, che talvolta assumeva i caratteri dell'incubo."
# text = 'Elabora relatórios de acompanhamento cronológico para as diferentes unidades do Departamento que propõem contratos.'

# 4️⃣ 循环生成、显示和保存音频文件。
generator = pipeline(
    text, voice='af_heart', # <= 在此更改语音
    speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
    print(i)  # i => 索引
    print(gs) # gs => 字符/文本
    print(ps) # ps => 音素
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000) # 保存每个音频文件

📚 详细文档

版本发布

模型	发布时间	训练数据	语言和语音	SHA256
v0.19	2024年12月25日	<100小时	1种语言和10种语音	`3b0c392f`
v1.0	2025年1月27日	几百小时	8种语言和54种语音	`496dba11`

训练成本	v0.19	v1.0	总计
A100 80GB GPU小时数	500	500	1000
平均每小时费率	$0.80/小时	$1.20/小时	$1/小时
美元成本	$400	$600	$1000

模型信息

属性	详情
模型架构	StyleTTS 2: https://arxiv.org/abs/2306.07691 ISTFTNet: https://arxiv.org/abs/2203.02395 仅解码器：无扩散，无编码器发布
架构设计	Li等人 @ https://github.com/yl4579/StyleTTS2
训练人员	`@rzvzn`（Discord）
支持语言	美式英语、英式英语、法语、印地语
模型SHA256哈希值	`496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4`

训练详情

训练数据：Kokoro仅在许可/无版权音频数据和国际音标（IPA）音素标签上进行训练。许可/无版权音频的示例包括：
- 公共领域音频
- 采用Apache、MIT等许可证的音频
- 大型供应商的封闭^[2] TTS模型生成的合成音频^[1]
  [1] https://copyright.gov/ai/ai_policy_guidance.pdf
  [2] 不使用开源TTS模型或“自定义语音克隆”生成的合成音频
总数据集大小：几百小时的音频
总训练成本：使用A100 80GB显存进行1000小时训练，约1000美元