C

Csm 1b

Developed by unsloth
CSM (Conversational Speech Model) is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.
Downloads 2,667
Release Time : 5/15/2025

Model Overview

CSM is a speech generation model based on the Llama backbone network and a lightweight audio decoder, capable of generating Mimi audio encoding. Fine-tuned variants of CSM support interactive speech demonstrations.

Model Features

Efficient Performance
1.5x faster with Unsloth runtime, 58% less memory usage
Context Awareness
Supports improved generation quality through contextual audio segments
Multi-speaker Support
Controls different speaker tones via the speaker parameter

Model Capabilities

Text-to-speech generation
Multi-speaker speech synthesis
Context-aware speech generation

Use Cases

Voice Interaction
Conversational Voice Assistant
Converts LLM-generated text into natural speech
Achieves a more natural voice interaction experience
Content Creation
Audio Content Generation
Converts text content into speech
Quickly generates podcasts, audiobooks, and other content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase