A

Ast Finetuned Audioset 10 10 0.450

Developed by MIT
An audio spectrogram transformer fine-tuned on the AudioSet dataset, utilizing ViT architecture for processing audio spectrograms, achieving excellent performance in audio classification tasks.
Downloads 109
Release Time : 11/14/2022

Model Overview

This model converts audio into spectrograms and processes them using a vision transformer, suitable for audio classification tasks and achieving state-of-the-art results in multiple benchmarks.

Model Features

Spectrogram Processing
Converts audio signals into spectrogram format and processes them using a vision transformer architecture.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, providing robust audio classification capabilities.
State-of-the-Art Performance
Achieves state-of-the-art results in multiple audio classification benchmarks.

Model Capabilities

Audio Classification
Spectrogram Analysis
Multi-class Audio Recognition

Use Cases

Audio Content Analysis
Environmental Sound Classification
Identifies and classifies various environmental sounds (e.g., animal sounds, vehicle noises, etc.)
Can accurately classify 527 sound categories in AudioSet.
Music Classification
Classifies music clips by genre or instrument.
Multimedia Content Moderation
Inappropriate Content Detection
Detects inappropriate or sensitive content in audio.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase