C

Csm 1b

Developed by eustlb
CSM is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio codes from text and audio inputs, supporting context-aware speech generation.
Downloads 5,144
Release Time : 3/26/2025

Model Overview

A speech generation model based on the Llama backbone network and a lightweight audio decoder, capable of outputting Mimi audio codes, suitable for text-to-speech tasks.

Model Features

Context-Aware Generation
Supports historical dialogue audio and text as contextual input to optimize current speech generation effects.
Efficient Architecture Design
Utilizes the Llama backbone network combined with a lightweight decoder to balance generation quality and computational efficiency.
Multimodal Input
Supports simultaneous processing of text and audio inputs for more natural speech interaction.

Model Capabilities

Text-to-speech generation
Context-aware speech synthesis
Multi-speaker speech generation

Use Cases

Interactive Voice Applications
Voice Assistants
Provides natural speech output for dialogue systems.
Demo cases show the ability to generate speech with emotional intonation.
Content Creation
Audiobook Generation
Automatically converts text content into speech.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase