Ast Finetuned Audioset 14 14 0.443
An audio spectrogram transformer fine-tuned on the AudioSet dataset, which converts audio into spectrograms and processes them using a vision transformer architecture, achieving excellent performance in audio classification tasks.
Downloads 194.20k
Release Time : 11/14/2022
Model Overview
This model employs a vision transformer architecture to process audio spectrograms, specifically designed for audio classification tasks and fine-tuned on the AudioSet dataset.
Model Features
Spectrogram Conversion
Converts audio signals into spectrogram form, enabling vision transformer architectures to process audio data.
Transformer-based
Utilizes a vision transformer architecture, avoiding the inductive biases of traditional CNNs.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, providing robust audio classification capabilities.
Model Capabilities
Audio Classification
Spectrogram Analysis
Multi-class Audio Recognition
Use Cases
Audio Analysis
Environmental Sound Classification
Identifies and classifies various environmental sounds, such as animal calls and vehicle noises.
Music Classification
Classifies music segments to identify genres or instruments.
Multimedia Content Analysis
Video Audio Analysis
Analyzes audio content in videos to assist in video classification and retrieval.
Featured Recommended AI Models