S

Slam Scaled

Developed by slprl
A high-quality speech language model trained on a single GPU within 24 hours, fine-tuned based on Qwen2.5-0.5B, using Hubert tokens as vocabulary
Downloads 792
Release Time : 2/18/2025

Model Overview

A speech language model focused on speech segment generation, supporting efficient training and inference through discrete speech tokens

Model Features

Efficient Training
Only requires a single academic-grade GPU to complete high-quality model training within 24 hours
Speech Token Processing
Uses 500 speech tokens extracted from mhubert-25hz as vocabulary
Multi-Stage Optimization
Combines pre-training and DPO preference optimization to enhance generation quality
Low Resource Requirements
Only requires 2 A100 GPUs for 48 hours of training, with extremely low computational costs

Model Capabilities

Speech Segment Generation
Speech Continuation Prediction
Speech Token Processing

Use Cases

Speech Generation
Speech Story Continuation
Generates coherent follow-up content based on given speech segments
Achieved 61.30% accuracy on the sStoryCloze test set
Speech Interaction System
Serves as the generation component for speech dialogue systems
Educational Applications
Language Learning Assistance
Generates speech practice materials
Featured Recommended AI Models
ยฉ 2025AIbase