S

Ssast Base Patch Audioset 16 16

Developed by Simon-Kotchou
Audio classification model pre-trained on AudioSet/Librispeech using self-supervised learning
Downloads 56
Release Time : 1/10/2024

Model Overview

This model converts audio into spectrograms and applies a vision transformer architecture, excelling in multiple audio classification tasks. The classifier head requires fine-tuning before use.

Model Features

Self-supervised pre-training
Utilizes self-supervised learning for pre-training on large-scale audio data, reducing dependency on labeled data
Spectrogram transformer architecture
Innovatively applies vision transformer (ViT) to audio spectrograms for efficient feature extraction
Multi-task adaptability
Pre-trained model can be fine-tuned for various audio classification tasks

Model Capabilities

Audio feature extraction
Audio classification
Spectrogram analysis

Use Cases

Audio content analysis
Environmental sound classification
Identifies and classifies various environmental sounds (e.g., animal calls, traffic noise)
Achieves state-of-the-art performance on AudioSet benchmark
Speech emotion recognition
Analyzes speaker's emotional state through speech spectrograms
Speech processing
Voice command recognition
Recognizes short voice commands
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase