Wespeaker-VoxCeleb-ResNet293-LM Open-Source Model - Supports Tasks Such as Speaker Recognition and Similarity Calculation

Wespeaker Voxceleb Resnet293 LM

Developed by Wespeaker

A speaker embedding model based on ResNet293 architecture, optimized with large margin fine-tuning, supporting tasks such as speaker recognition, similarity calculation, and speech segmentation

Speaker Analysis English#Speaker Recognition #Large Margin Fine-Tuning Optimization #Multi-Speaker Scenarios

Downloads 108

Release Time : 12/28/2023

Model Overview

This model is provided by the Wespeaker project, utilizing the ResNet293 architecture and optimized with large margin fine-tuning, primarily for speaker recognition and speech processing tasks. Trained on the VoxCeleb2 development dataset, it includes 5994 speakers.

Model Features

Large Margin Fine-Tuning Optimization

Optimizes model performance using large margin fine-tuning technology, significantly improving speaker recognition accuracy

Efficient Architecture

Based on ResNet293 architecture, maintaining high performance while controlling computational load

Multi-Task Support

Supports various tasks including speaker embedding extraction, similarity calculation, and speech segmentation

Model Capabilities

Speaker Recognition

Speaker Similarity Calculation

Speech Segmentation

Speaker Enrollment and Recognition

Use Cases

Voice Biometrics

Speaker Verification

Verify whether an audio sample belongs to a specific speaker

EER of 0.447 on the VoxCeleb test set

Speech Analysis

Meeting Speech Segmentation

Identify and segment different speakers in meeting recordings

🚀 Wespeaker ResNet293-based r-vector Model

This is an official speaker embedding model provided by the Wespeaker project. It is based on ResNet293 r-vector (After large margin finetune) and trained on the VoxCeleb2 Dev dataset with 5994 speakers.

✨ Features

Official Model: Provided by the Wespeaker project.
Training Data: Trained on the VoxCeleb2 Dev dataset with 5994 speakers.

📦 Installation

Install Wespeaker

pip install git+https://github.com/wenet-e2e/wespeaker.git

Development Install

git clone https://github.com/wenet-e2e/wespeaker.git
cd wespeaker
pip install -e .

💻 Usage Examples

Command line Usage

$ wespeaker -p resnet293_download_dir --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker -p resnet293_download_dir --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker -p resnet293_download_dir --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker -p resnet293_download_dir --task diarization --audio_file audio.wav

Python Programming Usage

import wespeaker

model = wespeaker.load_model_local(resnet293_download_dir)
# set_gpu to enable the cuda inference, number < 0 means using CPU
model.set_gpu(0)

# embedding/embedding_kaldi/similarity/diarization
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')

# register and recognize
model.register('spk1', 'spk1_audio1.wav')
model.register('spk2', 'spk2_audio1.wav')
model.register('spk3', 'spk3_audio1.wav')
result = model.recognize('spk1_audio2.wav')

📚 Documentation

Model Sources

Repository: https://github.com/wenet-e2e/wespeaker
Paper: https://arxiv.org/pdf/2210.17016.pdf
Demo: https://huggingface.co/spaces/wenet/wespeaker_demo

Results on VoxCeleb

Model	Params	Flops	LM	AS-Norm	vox1-O-clean	vox1-E-clean	vox1-H-clean
ResNet293-TSTP-emb256	28.62M	28.10G	×	×	0.595	0.756	1.433
			×	√	0.537	0.701	1.276
			√	×	0.532	0.707	1.311
			√	√	0.447	0.657	1.183

📄 License

This model is licensed under CC-BY-4.0.

📚 Citation

@inproceedings{wang2023wespeaker,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご