# Real-time speech generation

Qwen2.5 Omni 7B AWQ
Other
Qwen2.5-Omni is an end-to-end multimodal model capable of perceiving multiple modalities including text, images, audio, and video, while generating text and natural speech responses in a streaming manner.
Multimodal Fusion Transformers English
Q
Qwen
77
8
Spark TTS 0.5B 8bit
This is a text-to-speech model based on the MLX format, supporting both English and Chinese, converted from prince-canuma/Spark-TTS-0.5B.
Speech Synthesis Supports Multiple Languages
S
mlx-community
56
1
Spark TTS 0.5B 4 6bit
Spark-TTS-0.5B-4-6bit is a text-to-speech model based on the MLX format, supporting both English and Chinese.
Speech Synthesis Supports Multiple Languages
S
mlx-community
59
0
Muyan TTS SFT Q8 0 GGUF
This model is a GGUF format text-to-speech model converted from MYZY-AI/Muyan-TTS-SFT, supporting Chinese speech synthesis.
Speech Synthesis
M
NikolayKozloff
20
1
Kokorotts
Apache-2.0
Kokoro is an open-source text-to-speech model with 82 million parameters, delivering sound quality comparable to large models through a lightweight architecture while significantly improving speed and cost efficiency.
Speech Synthesis English
K
Daemontatox
78
0
Llasa 1B Q8 0 GGUF
This model is converted from HKUST-Audio/Llasa-1B into GGUF format, primarily designed for text-to-speech tasks.
Speech Synthesis Supports Multiple Languages
L
NikolayKozloff
16
3
Hindi Text To Speech Tts
MIT
Hindi text-to-speech model fine-tuned based on microsoft/speecht5_tts
Speech Synthesis Transformers
H
ShigrafS
23
0
XTTS V2 Argentinian Spanish
Other
ⓍTTS is a speech generation model that can clone voices with just 6 seconds of audio and apply them to different languages. No need for hours of extensive training data.
Speech Synthesis Spanish
X
marianbasti
44
5
Mms Tts Nova Train
CC
This is a Shan language text-to-speech (TTS) model designed to convert Shan text into natural speech.
Speech Synthesis Transformers Other
M
NorHsangPha
28
0
Speecht5 Tts Commonvoice Ca
MIT
Catalan text-to-speech model based on the SpeechT5 architecture, fine-tuned on the Common Voice 11.0 dataset
Speech Synthesis Transformers Other
S
wetdog
22
0
Tts Hifigan
HiFiGAN is a Generative Adversarial Network (GAN) model capable of generating high-quality audio from mel-spectrograms, suitable for text-to-speech systems.
Speech Synthesis English
T
nvidia
5,022
36
Hifigan Lj V1
A HiFi-GAN vocoder model trained on the LJ Speech dataset for high-quality speech synthesis
Speech Synthesis Transformers English
H
jaketae
32
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase