Open-source vit_spectrogram model - Accurately identify male and female gender characteristics in audio spectrograms

Vit Spectrogram

Developed by prashanth0205

A spectrogram classification model based on Vision Transformer architecture for identifying gender characteristics (male/female classification) in audio spectrograms

Audio Classification

Transformers

Open Source License:Apache-2.0 #Spectrogram Classification #Gender Recognition #ViT Fine-tuning

Downloads 24

Release Time : 7/6/2022

Model Overview

This model is a fine-tuned Vision Transformer based on the google/vit-base-patch16-224-in21k pre-trained model, specifically adapted for Mel spectrogram data, primarily used for audio gender classification tasks.

Model Features

Fine-tuned from pre-trained ViT model

Initialized with google/vit-base-patch16-224-in21k pre-trained weights and fine-tuned on spectrogram data

Efficient spectrogram processing

Processes and classifies audio converted to Mel spectrograms using Vision Transformer

Mixed precision training

Utilizes mixed_float16 precision for training to balance computational efficiency and model accuracy

Model Capabilities

Audio spectrogram analysis

Gender classification (male/female)

Mel spectrogram feature extraction

Use Cases

Audio analysis

Speech gender recognition

Determines speaker gender by analyzing speech spectrograms

Validation set accuracy 93.66%

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Vit Spectrogram

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 vit_spectrogram

🚀 Quick Start

📚 Documentation

Model description

Intended uses & limitations

Training and evaluation data

🔧 Technical Details

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License