C

Csm 1b

Developed by sesame
CSM is a 1-billion-parameter voice generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs
Downloads 65.03k
Release Time : 3/6/2025

Model Overview

A conversational voice model utilizing Llama backbone network and lightweight audio decoder architecture, capable of generating Mimi audio encoding, suitable for text-to-speech tasks

Model Features

Context-aware Generation
Supports generating more natural conversational speech through contextual audio segments
Multi-timbre Support
Base model can generate multiple timbres (specific timbres require fine-tuning)
Efficient Architecture
Combines Llama backbone network with lightweight decoder to balance performance and efficiency

Model Capabilities

Text-to-speech generation
Conversational speech synthesis
Multi-speaker voice generation

Use Cases

Voice Interaction
Virtual Assistant
Generates natural speech responses for dialogue systems
Demonstration shows smooth conversational interaction effects
Content Creation
Audio Content Generation
Converts text content into speech
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase