Ast Finetuned Audioset 12 12 0.447
An Audio Spectrogram Transformer (AST) fine-tuned on the AudioSet dataset, using ViT architecture to process audio spectrograms, achieving excellent performance on multiple audio classification benchmarks.
Downloads 25
Release Time : 11/14/2022
Model Overview
This model converts audio into spectrograms and processes them through a vision transformer, primarily used for audio classification tasks, supporting classification of 527 categories in AudioSet.
Model Features
Spectrogram Processing
Converts audio signals into spectrograms and processes them using a vision transformer for efficient audio feature extraction.
AudioSet Fine-tuning
Fine-tuned on the large-scale AudioSet dataset, supporting classification of 527 audio categories.
ViT Architecture Adaptation
Innovatively applies the Vision Transformer (ViT) architecture to the audio domain, achieving SOTA performance.
Model Capabilities
Audio Classification
Spectrogram Analysis
Multi-category Audio Recognition
Use Cases
Content Classification
Environmental Sound Recognition
Identifies types of environmental sounds in recordings (e.g., rain, traffic noise)
Can accurately classify 527 environmental sound types included in AudioSet
Media Analysis
Video Soundtrack Analysis
Automatically analyzes the content categories of soundtracks in videos
Featured Recommended AI Models