S

Speechgpt 7B Cm

Developed by fnlp
SpeechGPT is a large language model with intrinsic cross-modal dialogue capabilities, capable of perceiving and generating multimodal content, supporting interaction via speech and text.
Downloads 47
Release Time : 9/14/2023

Model Overview

Through discrete speech representation and a three-stage training strategy (modality adaptation pre-training, cross-modal instruction fine-tuning, and modality-chain instruction fine-tuning), SpeechGPT achieves alignment between speech and text, enabling it to handle various cross-modal tasks.

Model Features

Cross-modal Dialogue Capability
Capable of processing both speech and text input/output, enabling cross-modal interaction.
Three-stage Training Strategy
Enhances model performance progressively through modality adaptation pre-training, cross-modal instruction fine-tuning, and modality-chain instruction fine-tuning.
Large-scale Speech Instruction Dataset
Constructed the SpeechInstruct dataset, containing cross-modal instructions and modality-chain instructions.

Model Capabilities

Speech recognition
Speech synthesis
Cross-modal dialogue
Text generation
Multimodal instruction following

Use Cases

Personal Assistant
Voice Q&A
Obtain information responses through voice queries
Provides accurate voice or text responses
Education
Language Learning
Assists learners in practicing English listening and speaking skills
Offers voice interaction and pronunciation feedback
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase