A

Ast Finetuned Audioset 10 10 0.448 V2

Developed by MIT
An audio spectrogram transformer fine-tuned on the AudioSet dataset, which converts audio into spectrograms and processes them using a vision transformer, excelling in audio classification tasks.
Downloads 2,072
Release Time : 11/14/2022

Model Overview

This model is an audio classification model based on the ViT architecture. It converts audio signals into spectrogram form and processes them using a vision transformer, suitable for various audio classification tasks.

Model Features

Spectrogram Conversion Processing
Converts audio signals into spectrogram form and processes them using a vision transformer architecture, effectively capturing audio features.
AudioSet Fine-tuning
Fine-tuned on the large-scale audio dataset AudioSet, it possesses robust audio classification capabilities.
SOTA Performance
Achieves state-of-the-art performance in multiple audio classification benchmark tests.

Model Capabilities

Audio Classification
Spectrogram Analysis
Audio Feature Extraction

Use Cases

Audio Content Analysis
Environmental Sound Classification
Identifies and classifies various types of environmental sounds, such as animal calls, vehicle noises, etc.
High-accuracy sound category recognition
Music Classification
Classifies music clips by genre, instruments, etc.
Multimedia Content Moderation
Inappropriate Audio Detection
Identifies potentially inappropriate or sensitive content in audio.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase