S

Slam

Developed by slprl
This is a speech language model based on discrete Hubert tokens, focusing on efficient training and capable of generating speech segment continuations.
Downloads 115
Release Time : 2/18/2025

Model Overview

This model is fine-tuned from Qwen/Qwen2.5-0.5B, based on a vocabulary of 500 speech tokens extracted from the 11th layer of mhubert-25hz. It can be used to generate speech segment continuations or serve as a foundation for further tuning.

Model Features

Efficient Training
Utilizes the method proposed in the paper 'Slamming,' enabling training completion within one day using a single GPU.
Speech Token Processing
Based on a vocabulary of 500 speech tokens extracted from the 11th layer of mhubert-25hz.
DPO Training
Trained with DPO on the SpokenSwag dataset to optimize generation quality.

Model Capabilities

Speech segment continuation generation
Speech language model fine-tuning foundation

Use Cases

Speech Generation
Speech Story Continuation
Generates coherent follow-up content based on a given speech story segment.
Useful for audiobook creation or voice interaction applications
Speech Dialogue Continuation
Generates natural responses in voice dialogue systems.
Enhances the naturalness and coherence of dialogue systems
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase