C

Ced Base

Developed by mispeech
CED is a simple audio tagging model based on ViT-Transformer, achieving state-of-the-art performance on Audioset.
Downloads 1,318
Release Time : 11/24/2023

Model Overview

CED is a Transformer model for audio classification, featuring efficient inference speed and excellent performance.

Model Features

Simplified Fine-tuning
Uses batch normalization for Mel spectrograms, eliminating the need to precompute dataset mean/variance during fine-tuning.
Supports Variable-Length Input
Most models use static time-frequency positional encoding, limiting generalization for clips shorter than 10 seconds. CED solves this issue.
Training/Inference Acceleration
Employs 64-dimensional Mel filter banks and 16x16 non-overlapping patches, significantly improving training/inference speed compared to AST models.
Performance Advantage
A CED model with only 10M parameters outperforms most previous solutions with around 80M parameters.

Model Capabilities

Audio Classification
Audio Tagging

Use Cases

Audio Recognition
Finger Snap Recognition
Can accurately identify finger snap sounds in audio
Accurate classification
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase