A

Ast Finetuned Audioset 12 12 0.447

Developed by MIT
An Audio Spectrogram Transformer (AST) fine-tuned on the AudioSet dataset, using ViT architecture to process audio spectrograms, achieving excellent performance on multiple audio classification benchmarks.
Downloads 25
Release Time : 11/14/2022

Model Overview

This model converts audio into spectrograms and processes them through a vision transformer, primarily used for audio classification tasks, supporting classification of 527 categories in AudioSet.

Model Features

Spectrogram Processing
Converts audio signals into spectrograms and processes them using a vision transformer for efficient audio feature extraction.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, supporting classification of 527 audio categories.
ViT Architecture Adaptation
Innovatively applies the Vision Transformer (ViT) architecture to the audio domain, achieving SOTA performance.

Model Capabilities

Audio Classification
Spectrogram Analysis
Multi-category Audio Recognition

Use Cases

Content Classification
Environmental Sound Recognition
Identifies types of environmental sounds in recordings (e.g., rain, traffic noise)
Can accurately classify 527 environmental sound types included in AudioSet
Media Analysis
Video Soundtrack Analysis
Automatically analyzes the content categories of soundtracks in videos
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase