Open-source model of gender_cls_svm_ecapa_voxceleb - Easily predict the gender of the speaker from audio!

Gender Cls Svm Ecapa Voxceleb

Developed by griko

Based on SpeechBrain's ECAPA-TDNN speaker embedding model and SVM classifier, it can predict speaker gender from audio input.

Audio Processing OtherOpen Source License:Apache-2.0 #High-precision voiceprint analysis #ECAPA-TDNN embeddings #Multi-dataset validation

Downloads 29

Release Time : 11/9/2024

Model Overview

This model combines ECAPA-TDNN speaker embeddings with an SVM classifier to identify speaker gender from audio, supporting binary classification (male/female).

Model Features

High-precision classification

Achieves 98.9% accuracy on the VoxCeleb2 test set and 99.6% accuracy on the TIMIT test set.

Multi-dataset validation

Performance validated on VoxCeleb2, Mozilla Common Voice, and TIMIT datasets.

Optimized classifier

SVM classifier fine-tuned through 200 Optuna optimizations.

Automatic preprocessing

Supports automatic audio format conversion (16kHz/mono) and voice activity detection.

Model Capabilities

Gender classification

Speaker feature extraction

Audio processing

Voiceprint analysis

Use Cases

Speech analysis

Speaker gender recognition

Automatically identifies speaker gender from audio.

High accuracy (VoxCeleb2: 98.9%)

Speech dataset processing

Dataset gender labeling

Automatically adds gender labels to unlabeled speech datasets.

🚀 Gender Classification Model

This model combines the SpeechBrain ECAPA - TDNN speaker embedding model with an SVM classifier to predict speaker gender from audio input, trained and evaluated on multiple datasets.

🚀 Quick Start

This model combines the SpeechBrain ECAPA - TDNN speaker embedding model with an SVM classifier. It can predict speaker gender from audio input. The model has been trained and evaluated on the VoxCeleb2, Mozilla Common Voice v10.0, and TIMIT datasets.

✨ Features

Accurate Prediction: Achieves high accuracy and F1 - score on multiple datasets like VoxCeleb2, Mozilla Common Voice v10.0, and TIMIT.
Well - defined Input and Output: Accepts audio files as input and outputs gender predictions.
Optimized Classifier: Uses a Support Vector Machine optimized through Optuna.

📦 Installation

You can install the package directly from GitHub:

pip install git+https://github.com/griko/voice-gender-classification.git

💻 Usage Examples

Basic Usage

from voice_gender_classification import GenderClassificationPipeline

# Load the pipeline
classifier = GenderClassificationPipeline.from_pretrained(
    "griko/gender_cls_svm_ecapa_voxceleb"
)

# Single file prediction
result = classifier("path/to/audio.wav")
print(result)  # ["female"] or ["male"]

# Batch prediction
results = classifier(["audio1.wav", "audio2.wav"])
print(results)  # ["female", "male", "female"]

📚 Documentation

Model Details

Property	Details
Input	Audio file (will be converted to 16kHz, mono, single channel)
Output	Gender prediction ("male" or "female")
Speaker embedding	192 - dimensional ECAPA - TDNN embedding from SpeechBrain
Classifier	Support Vector Machine optimized through Optuna (200 trials)
Performance	VoxCeleb2 test set: 98.9% accuracy, 0.9885 F1 - score; Mozilla Common Voice v10.0 English validated test set: 92.3% accuracy; TIMIT test set: 99.6% accuracy

Training Data

The model was trained on the VoxCeleb2 dataset:

Training set: 1,691 speakers (845 females, 846 males)
Validation set: 785 speakers (396 females, 389 males)
Test set: 1,647 speakers (828 females, 819 males)
No speaker overlap between sets
Audio preprocessing:
- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
- Applied SileroVAD for voice activity detection, taking the first voiced segment

Limitations

The model was trained on celebrity voices from YouTube interviews.
Performance may vary on different audio qualities or recording conditions.
It is designed for binary gender classification only.

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご