Vit Spectrogram
A spectrogram classification model based on Vision Transformer architecture for identifying gender characteristics (male/female classification) in audio spectrograms
Downloads 24
Release Time : 7/6/2022
Model Overview
This model is a fine-tuned Vision Transformer based on the google/vit-base-patch16-224-in21k pre-trained model, specifically adapted for Mel spectrogram data, primarily used for audio gender classification tasks.
Model Features
Fine-tuned from pre-trained ViT model
Initialized with google/vit-base-patch16-224-in21k pre-trained weights and fine-tuned on spectrogram data
Efficient spectrogram processing
Processes and classifies audio converted to Mel spectrograms using Vision Transformer
Mixed precision training
Utilizes mixed_float16 precision for training to balance computational efficiency and model accuracy
Model Capabilities
Audio spectrogram analysis
Gender classification (male/female)
Mel spectrogram feature extraction
Use Cases
Audio analysis
Speech gender recognition
Determines speaker gender by analyzing speech spectrograms
Validation set accuracy 93.66%
Featured Recommended AI Models
Š 2025AIbase