distil-ast-audioset Open-Source Audio Classification Model - Efficiently Complete Various Audio Classification Tasks

Distil Ast Audioset

Developed by bookbot

An audio classification model based on the audio spectrum transformer architecture, distilled from the original AST AudioSet model, suitable for audio classification tasks.

Audio Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Audio Classification Distillation #Spectrum Transformer #Lightweight Audio Processing

Downloads 917

Release Time : 3/20/2023

Model Overview

This model is a distilled version of MIT/ast-finetuned-audioset-10-10-0.4593 on the AudioSet dataset, primarily used for audio classification tasks.

Model Features

Distilled Model

By distilling the original AST AudioSet model, the number of parameters is reduced while maintaining good performance.

High-Performance Audio Classification

Performs excellently on the AudioSet dataset, achieving an F1 score of 0.4876 and ROC AUC of 0.7140.

Efficient Training

Trained using HuggingFace's PyTorch framework, supports mixed-precision training, optimizing training efficiency.

Model Capabilities

Audio Classification

Spectrum Analysis

Multi-label Classification

Use Cases

Audio Processing

Environmental Sound Classification

Used to identify and classify various environmental sounds, such as animal calls, vehicle sounds, etc.

Achieves an F1 score of 0.4876 and ROC AUC of 0.7140.

Music Classification

Used to classify music, identifying different music genres or instrument sounds.

Average precision (mAP) is 0.4743.

🚀 Distil Audio Spectrogram Transformer AudioSet

Distil Audio Spectrogram Transformer AudioSet is an audio classification model that addresses the need for efficient audio classification. It leverages a distilled architecture based on the Audio Spectrogram Transformer, providing a lightweight yet effective solution for audio analysis.

🚀 Quick Start

Distil Audio Spectrogram Transformer AudioSet is an audio classification model based on the Audio Spectrogram Transformer architecture. This model is a distilled version of MIT/ast-finetuned-audioset-10-10-0.4593 on the AudioSet dataset.

This model was trained using HuggingFace's PyTorch framework. All training was done on a Google Cloud Engine VM with a Tesla A100 GPU. All necessary scripts used for training could be found in the Files and versions tab, as well as the Training metrics logged via Tensorboard.

📚 Documentation

✨ Features

Based on the Audio Spectrogram Transformer architecture.
A distilled version of MIT/ast-finetuned-audioset-10-10-0.4593 on the AudioSet dataset.
Trained using HuggingFace's PyTorch framework on a Google Cloud Engine VM with a Tesla A100 GPU.

📦 Model

Property	Details
Model Type	`distil-ast-audioset`
#params	44M
Architecture	Audio Spectrogram Transformer
Training/Validation data	AudioSet

📊 Evaluation Results

The model achieves the following results on evaluation:

Model	F1	Roc Auc	Accuracy	mAP
Distil-AST AudioSet	0.4876	0.7140	0.0714	0.4743
AST AudioSet	0.4989	0.6905	0.1247	0.5603

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 0
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Accuracy	Map
1.5521	1.0	153	0.7759	0.3929	0.6789	0.0209	0.3394
0.7088	2.0	306	0.5183	0.4480	0.7162	0.0349	0.4047
0.484	3.0	459	0.4342	0.4673	0.7241	0.0447	0.4348
0.369	4.0	612	0.3847	0.4777	0.7332	0.0504	0.4463
0.2943	5.0	765	0.3587	0.4838	0.7284	0.0572	0.4556
0.2446	6.0	918	0.3415	0.4875	0.7296	0.0608	0.4628
0.2099	7.0	1071	0.3273	0.4896	0.7246	0.0648	0.4682
0.186	8.0	1224	0.3140	0.4888	0.7171	0.0689	0.4711
0.1693	9.0	1377	0.3101	0.4887	0.7157	0.0703	0.4741
0.1582	10.0	1530	0.3063	0.4876	0.7140	0.0714	0.4743

📄 License

This project is licensed under the Apache-2.0 license.

⚠️ Important Note

Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.

👥 Authors

Distil Audio Spectrogram Transformer AudioSet was trained and evaluated by Ananto Joyoadikusumo, David Samuel Setiawan, Wilson Wongso. All computation and development are done on Google Cloud.

🛠 Framework versions

Transformers 4.27.0.dev0
Pytorch 1.13.1+cu117
Datasets 2.10.0
Tokenizers 0.13.2

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご