# Audio classification

Ast Finetuned Audioset 10 10 0.4593 ONNX
This is the ONNX version of the AST (Audio Spectrogram Transformer) model, designed specifically for audio classification tasks and fine-tuned on the AudioSet dataset.
Audio Classification Transformers
A
onnx-community
684
1
Ast Finetuned Audioset 10 10 0.4593 Finetuned Gtzan
Bsd-3-clause
This model is an audio classification model based on the Audio Spectrogram Transformer (AST) architecture. After pre-training on the Audioset dataset, it was fine-tuned on the GTZAN music genre classification dataset.
Audio Classification Transformers
A
wkCircle
8
0
Frugal Ai Space
An audio classification model based on the wav2vec2 architecture, suitable for climate-related sound classification tasks
Audio Classification Transformers English
F
dannywillowliu
3
0
Seamless M4t V2 Large Speech Encoder
Speech encoder module extracted from SeamlessM4Tv2-Large, excelling in cross-language and multilingual sequence-level audio classification tasks
Audio Classification Transformers Supports Multiple Languages
S
WueNLP
67
3
Vietnamese Regional Accent Classification Model
This is an audio classification model for classifying Vietnamese dialects, achieving an F1 score of 0.8217 on the evaluation set.
Audio Classification Transformers
V
thangtrungnguyen
36
0
Baby Cry Classification Finetuned Babycry V4
Apache-2.0
A baby cry classification model fine-tuned based on wav2vec2-large-xlsr-53-english, achieving 81.5% accuracy
Audio Classification Transformers
B
Wiam
120
2
Detect Language
Apache-2.0
A language identification model fine-tuned based on the Whisper Medium model, specifically designed for language classification tasks on the FLEURS dataset
Audio Classification Transformers
D
apparaomulpuriril
15
0
Doa Model TL4
Openrail
This model is used to estimate the direction of arrival (DOA) of fixed sound sources, trained on the SOFA dataset and fine-tuned using the AST model.
Audio Classification Transformers English
D
FidelOdok
15
0
Wav2vec2 Base Finetuned Ks
Apache-2.0
A speech recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving 97.8% accuracy
Audio Classification Transformers
W
Dc26
23
2
0 9up Ast Ft
Bsd-3-clause
This model is a fine-tuned audio classification model based on MIT/ast-finetuned-speech-commands-v2 on the digital speech commands dataset, primarily used for recognizing 0-9 digit speech commands
Audio Classification Transformers
0
mazkooleg
19
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase