T

Tts 1.6b En Fr

Developed by kyutai
The Kyoto Station Text-to-Speech (TTS) model is a model for streaming text-to-speech, supporting real-time speech generation and multilingual processing.
Downloads 1,441
Release Time : 6/30/2025

Model Overview

This model uses a hierarchical Transformer architecture and supports streaming text-to-speech generation in English and French, with efficient generation and speech adjustment functions.

Model Features

Streaming speech generation
No need to wait for the complete text input. Audio output can start after receiving the first few words, improving real-time performance.
Multilingual support
Supports text-to-speech for both English and French.
Efficient generation
Improve the generation speed through CFG distillation training, making it easy for batch processing. 75 times the audio can be generated per computing unit time.
Speech adjustment
Supports speech adjustment through precomputed embeddings.

Model Capabilities

Streaming text-to-speech
Multilingual speech generation
Real-time speech output
Speech style adjustment

Use Cases

Real-time dialogue
Speech generation in dialogue scenarios
Generate speech responses in real-time in dialogue scenarios to enhance the interaction experience.
Achieve low-latency speech output
Multilingual applications
Multilingual speech synthesis
Generate natural speech for English and French content.
Support smooth speech output in two languages
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase