A

Ast Finetuned Audioset 10 10 0.4593

Developed by MIT
The Audio Spectrogram Transformer (AST) is a model fine-tuned on AudioSet, which converts audio into spectrograms and applies a vision transformer for audio classification.
Downloads 308.88k
Release Time : 11/14/2022

Model Overview

This model converts audio signals into spectrogram images and then applies a Vision Transformer (ViT) architecture for audio classification tasks, achieving excellent performance on multiple audio classification benchmarks.

Model Features

Spectrogram Conversion
Converts audio signals into visual spectrogram representations, enabling vision transformers to process audio data.
High-performance Audio Classification
Achieves state-of-the-art results on multiple audio classification benchmarks.
Based on ViT Architecture
Uses the Vision Transformer architecture to process audio spectrograms, demonstrating potential for cross-modal applications.

Model Capabilities

Audio Classification
Audio Feature Extraction
Spectrogram Analysis

Use Cases

Audio Content Analysis
Environmental Sound Classification
Identifies and classifies various environmental sounds, such as animal calls, vehicle sounds, etc.
Performs excellently on benchmarks like AudioSet.
Music Classification
Classifies music clips by genre or instrument.
Multimedia Content Understanding
Video Audio Analysis
Performs multimodal analysis by combining video content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase