C

Csm 1b Safetensors Fp16

Developed by lunahr
CSM (Conversational Speech Model) is a 1-billion-parameter speech generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs.
Downloads 79
Release Time : 4/25/2025

Model Overview

This model employs a Llama backbone network and a lightweight audio decoder to generate Mimi audio encoding, suitable for text-to-speech tasks.

Model Features

Multi-speaker Support
The model supports specifying different speaker IDs to generate voices with different timbres.
Context Awareness
Capable of leveraging conversational context to generate more natural speech output.
Efficient Architecture
Based on the Llama backbone network and a lightweight decoder, balancing performance and efficiency.

Model Capabilities

Text-to-Speech
Multi-speaker Voice Generation
Context-aware Speech Synthesis

Use Cases

Interactive Voice Applications
Voice Assistants
Provides natural voice output for virtual assistants.
Demonstrated in Sesame's interactive voice demos.
Dialogue Systems
Generates coherent conversational speech.
Can adjust voice style based on context.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase