V

Vit Spectrogram

Developed by prashanth0205
A spectrogram classification model based on Vision Transformer architecture for identifying gender characteristics (male/female classification) in audio spectrograms
Downloads 24
Release Time : 7/6/2022

Model Overview

This model is a fine-tuned Vision Transformer based on the google/vit-base-patch16-224-in21k pre-trained model, specifically adapted for Mel spectrogram data, primarily used for audio gender classification tasks.

Model Features

Fine-tuned from pre-trained ViT model
Initialized with google/vit-base-patch16-224-in21k pre-trained weights and fine-tuned on spectrogram data
Efficient spectrogram processing
Processes and classifies audio converted to Mel spectrograms using Vision Transformer
Mixed precision training
Utilizes mixed_float16 precision for training to balance computational efficiency and model accuracy

Model Capabilities

Audio spectrogram analysis
Gender classification (male/female)
Mel spectrogram feature extraction

Use Cases

Audio analysis
Speech gender recognition
Determines speaker gender by analyzing speech spectrograms
Validation set accuracy 93.66%
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase