C

Csm 1b

Developed by chutesai
CSM (Conversational Speech Model) is a 1-billion-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.
Downloads 814
Release Time : 3/18/2025

Model Overview

CSM is a speech generation model based on the Llama backbone network and a lightweight audio decoder, supporting the generation of Mimi audio encoding from text and audio inputs, suitable for text-to-speech tasks.

Model Features

Multi-tone generation
The base generation model can produce various tones, supporting tone performance optimization through contextual prompts.
Context-aware
Providing conversational context (text + audio) can significantly improve generation quality.
Efficient architecture
Based on the Llama backbone network and lightweight decoder, balancing performance and efficiency.

Model Capabilities

Text-to-speech
Multi-tone speech generation
Context-aware speech synthesis

Use Cases

Voice interaction
Conversational voice assistant
Combine LLM-generated text with natural speech conversion
Achieve more natural voice interaction experiences
Content creation
Audio content generation
Automatically convert text content into speech
Efficiently generate audiobooks, podcasts, and other audio content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase