A

Ast Finetuned Audioset 16 16 0.442

Developed by MIT
An audio spectrogram transformer fine-tuned on the AudioSet dataset, utilizing a vision transformer architecture to process audio spectrograms, achieving excellent performance in audio classification tasks.
Downloads 35
Release Time : 11/14/2022

Model Overview

This model converts audio into spectrograms and processes them via a vision transformer, specifically designed for audio classification tasks, supporting the recognition of multiple audio categories in the AudioSet dataset.

Model Features

Spectrogram Conversion Processing
Converts audio signals into spectrogram format and processes them using a vision transformer architecture for efficient audio feature extraction.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, providing robust audio classification capabilities.
State-of-the-Art Performance
Achieves state-of-the-art results in multiple audio classification benchmarks.

Model Capabilities

Audio Classification
Spectrogram Analysis
Multi-category Audio Recognition

Use Cases

Audio Content Analysis
Environmental Sound Recognition
Identifies various sounds in natural or urban environments
Accurately classifies hundreds of environmental sound types
Music Classification
Classifies music clips by genre or instrument
Multimedia Content Moderation
Inappropriate Content Detection
Identifies violent or inappropriate language in audio
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase