S

Speechgpt 7B Ma

Developed by fnlp
SpeechGPT is a large language model with intrinsic cross-modal dialogue capabilities, capable of perceiving and generating multimodal content based on human instructions.
Downloads 37
Release Time : 9/14/2023

Model Overview

SpeechGPT constructs a cross-modal speech instruction dataset through discrete speech representations, employs a three-stage training strategy, and demonstrates excellent multimodal human instruction following capabilities.

Model Features

Cross-modal Dialogue Capability
Capable of processing both speech and text input/output, enabling true cross-modal interaction
Three-stage Training Strategy
Adopts a three-stage training approach: modality adaptation pre-training, cross-modal instruction fine-tuning, and modality chain instruction fine-tuning
Large-scale Speech Instruction Dataset
Constructed the SpeechInstruct dataset containing approximately 9 million unit-text pairs

Model Capabilities

Speech recognition
Speech synthesis
Cross-modal dialogue
Text generation
Instruction following

Use Cases

Personal Assistant
Voice Q&A
Obtain information answers through voice questions
Can accurately understand questions and generate speech or text responses
Education
Language Learning
Help learners practice English listening and speaking skills
Provides interactive voice learning experience
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase