Ast Finetuned Audioset 10 10 0.4593
The Audio Spectrogram Transformer (AST) is a model fine-tuned on AudioSet, which converts audio into spectrograms and applies a vision transformer for audio classification.
Downloads 308.88k
Release Time : 11/14/2022
Model Overview
This model converts audio signals into spectrogram images and then applies a Vision Transformer (ViT) architecture for audio classification tasks, achieving excellent performance on multiple audio classification benchmarks.
Model Features
Spectrogram Conversion
Converts audio signals into visual spectrogram representations, enabling vision transformers to process audio data.
High-performance Audio Classification
Achieves state-of-the-art results on multiple audio classification benchmarks.
Based on ViT Architecture
Uses the Vision Transformer architecture to process audio spectrograms, demonstrating potential for cross-modal applications.
Model Capabilities
Audio Classification
Audio Feature Extraction
Spectrogram Analysis
Use Cases
Audio Content Analysis
Environmental Sound Classification
Identifies and classifies various environmental sounds, such as animal calls, vehicle sounds, etc.
Performs excellently on benchmarks like AudioSet.
Music Classification
Classifies music clips by genre or instrument.
Multimedia Content Understanding
Video Audio Analysis
Performs multimodal analysis by combining video content.
Featured Recommended AI Models
Š 2025AIbase