A

Ast Finetuned Audioset 14 14 0.443

Developed by MIT
An audio spectrogram transformer fine-tuned on the AudioSet dataset, which converts audio into spectrograms and processes them using a vision transformer architecture, achieving excellent performance in audio classification tasks.
Downloads 194.20k
Release Time : 11/14/2022

Model Overview

This model employs a vision transformer architecture to process audio spectrograms, specifically designed for audio classification tasks and fine-tuned on the AudioSet dataset.

Model Features

Spectrogram Conversion
Converts audio signals into spectrogram form, enabling vision transformer architectures to process audio data.
Transformer-based
Utilizes a vision transformer architecture, avoiding the inductive biases of traditional CNNs.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, providing robust audio classification capabilities.

Model Capabilities

Audio Classification
Spectrogram Analysis
Multi-class Audio Recognition

Use Cases

Audio Analysis
Environmental Sound Classification
Identifies and classifies various environmental sounds, such as animal calls and vehicle noises.
Music Classification
Classifies music segments to identify genres or instruments.
Multimedia Content Analysis
Video Audio Analysis
Analyzes audio content in videos to assist in video classification and retrieval.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase