D

Dasheng 1.2B

Developed by mispeech
DaSheng is a general-purpose audio encoder trained with large-scale self-supervised learning, capable of capturing rich audio information across multiple domains such as speech, music, and environmental sounds.
Downloads 135
Release Time : 6/6/2024

Model Overview

DaSheng is a general-purpose audio encoder with 1.2 billion parameters, trained on 272,356 hours of diverse audio data, excelling in tasks like speech, music, and environmental sound classification.

Model Features

Large-scale Training
Trained with 272,356 hours of diverse audio data
Multi-domain Applicability
Capable of processing various audio types including speech, music, and environmental sounds
High Performance
Outperforms previous results in the HEAR benchmark, excelling in multiple tasks
General-purpose Encoder
Extracts audio embedding features suitable for various downstream tasks

Model Capabilities

Audio Feature Extraction
Speech Classification
Music Classification
Environmental Sound Classification
Audio Embedding Generation

Use Cases

Speech Processing
Speech Command Recognition
Recognize short speech commands
Excellent performance on Speech Commands tasks
Speaker Counting
Count the number of speakers in audio
Achieves good results on LibriCount tasks
Music Analysis
Music Classification
Classify music clips
Excellent performance in music classification tasks
Environmental Sound Analysis
Environmental Sound Recognition
Identify various sounds in the environment
Good performance in environmental sound classification tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase