S

Ssast Small Patch Audioset 16 16

Developed by Simon-Kotchou
Audio classification model pre-trained on AudioSet and Librispeech, using vision transformer architecture to process audio spectrograms
Downloads 2,408
Release Time : 1/10/2024

Model Overview

This model converts audio into spectrograms and applies a vision transformer architecture, achieving state-of-the-art results in multiple audio classification tasks. Includes an uninitialized classifier head that requires fine-tuning before use.

Model Features

Self-supervised pre-training
Utilizes large-scale audio data for self-supervised learning, acquiring general audio features without labeled data
Spectrogram transformer architecture
Innovatively applies Vision Transformer (ViT) to audio spectrograms for end-to-end audio feature learning
Multi-task adaptability
Pre-trained model can be fine-tuned for various audio classification tasks

Model Capabilities

Audio feature extraction
Audio classification
Spectrogram analysis

Use Cases

Audio content analysis
Environmental sound classification
Identify types of environmental sounds in recordings (e.g., rain, traffic noise)
Excellent performance on AudioSet benchmark
Speech content classification
Classify speech recordings (e.g., emotion recognition, language identification)
Pre-trained on Librispeech, suitable for speech-related tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase