D

Dasheng Base

Developed by mispeech
Large-scale general-purpose audio encoder trained via self-supervised learning, capable of processing multi-domain audio information including speech, music, and environmental sounds
Downloads 273
Release Time : 6/6/2024

Model Overview

Dasheng is a general-purpose audio encoder trained on large-scale self-supervised learning tasks, designed to capture rich audio information across multiple domains such as speech, music, and environmental sounds.

Model Features

Large-scale training
Training data covers 272,356 hours of diverse audio
Multi-domain applicability
Capable of processing various audio types including speech, music, and environmental sounds
High performance
Demonstrates significant performance improvements on the HEAR benchmark, surpassing previous achievements

Model Capabilities

Audio feature extraction
Speech classification
Music classification
Environmental sound classification
Audio embedding generation

Use Cases

Speech processing
Speech command recognition
Used for identifying speech commands
Excellent performance on Speech Commands tasks
Speaker recognition
Used for identifying different speakers
Excellent performance on VoxLingua tasks
Music analysis
Music classification
Classifying music genres
Excellent performance in music classification tasks
Environmental sound analysis
Environmental sound classification
Classifying environmental sounds
Excellent performance in environmental sound classification tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase