Ced Small
CED is a simple audio tagging model based on ViT-Transformer, achieving state-of-the-art performance on Audioset.
Downloads 18
Release Time : 11/24/2023
Model Overview
CED is a Transformer model for audio classification, specifically optimized for audio tagging tasks, supporting variable-length input and simplifying the fine-tuning process.
Model Features
Simplified Fine-Tuning
Batch normalization for Mel spectrograms eliminates the need to precompute dataset mean/variance during fine-tuning.
Variable-Length Input Support
Breaks the traditional Transformer's 10-second segment limitation, enhancing model generalization.
Efficient Training/Inference
Optimized chunking strategy significantly reduces computational costs compared to AST models.
High-Performance Compact Model
The 10M-parameter CED model outperforms most 80M-parameter solutions.
Model Capabilities
Audio Classification
Audio Tagging
Sound Event Detection
Use Cases
Sound Recognition
Environmental Sound Classification
Identify various types of environmental sounds
Achieves 49.6 mAP on Audioset
Specific Sound Detection
Detect specific sound events like finger snaps
Accurately recognizes 527 sound categories
Featured Recommended AI Models