Hubert-base-superb-sid Open-source Speaker Identification Model - A Practical Choice Optimized for SUPERB Tasks

Home

Hubert Base Superb Sid

Developed by superb

Hubert-based speaker recognition model optimized for the SUPERB benchmark tasks

Speaker Analysis

Transformers

EnglishOpen Source License:Apache-2.0 #Speaker Recognition #16kHz Audio Processing #VoxCeleb Dataset

Downloads 673

Release Time : 3/2/2022

Model Overview

This model is a speaker recognition system based on the Hubert architecture, specifically designed to classify speech segments by speaker identity. It is trained on the VoxCeleb1 dataset and suitable for speaker identification tasks.

Model Features

Hubert Architecture-based

Uses facebook/hubert-base-ls960 as the base model, which is pre-trained on 16kHz sampled speech audio

Specialized for Speaker Recognition

Optimized specifically for speaker identification tasks and trained on the VoxCeleb1 dataset

High Accuracy

Achieves an accuracy of 0.8071 on the test set

Model Capabilities

Speaker Recognition

Speech Classification

Audio Analysis

Use Cases

Security Verification

Voice Identity Verification

Authenticate user identity through voice recognition for security verification

Accurately identifies registered users

Speech Analysis

Meeting Transcript Analysis

Identify speech segments from different speakers in meeting transcripts

Automatically distinguishes between different speakers

🚀 Hubert-Base for Speaker Identification

This model is designed for speaker identification, leveraging the Hubert architecture to classify speaker identities in speech audio.

🚀 Quick Start

You can quickly start using this model through the following code examples.

✨ Features

Ported Version: This is a ported version of S3PRL's Hubert for the SUPERB Speaker Identification task.
Base Model: The base model is hubert-base-ls960, pretrained on 16kHz sampled speech audio.
Task and Dataset: It is used for the Speaker Identification (SI) task, adopting the widely used VoxCeleb1 dataset.

📦 Installation

The original document does not provide installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from datasets import load_dataset
from transformers import pipeline

dataset = load_dataset("anton-l/superb_demo", "si", split="test")

classifier = pipeline("audio-classification", model="superb/hubert-base-superb-sid")
labels = classifier(dataset[0]["file"], top_k=5)

Advanced Usage

import torch
import librosa
from datasets import load_dataset
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor

def map_to_array(example):
    speech, _ = librosa.load(example["file"], sr=16000, mono=True)
    example["speech"] = speech
    return example

# load a demo dataset and read audio files
dataset = load_dataset("anton-l/superb_demo", "si", split="test")
dataset = dataset.map(map_to_array)

model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-sid")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-sid")

# compute attention masks and normalize the waveform if needed
inputs = feature_extractor(dataset[:2]["speech"], sampling_rate=16000, padding=True, return_tensors="pt")

logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
labels = [model.config.id2label[_id] for _id in predicted_ids.tolist()]

📚 Documentation

Model description

This is a ported version of S3PRL's Hubert for the SUPERB Speaker Identification task.

The base model is hubert-base-ls960, which is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

For more information refer to SUPERB: Speech processing Universal PERformance Benchmark

Task and dataset description

Speaker Identification (SI) classifies each utterance for its speaker identity as a multi-class classification, where speakers are in the same predefined set for both training and testing. The widely used VoxCeleb1 dataset is adopted

For the original model's training and evaluation instructions refer to the S3PRL downstream task README.

Eval results

The evaluation metric is accuracy.

Property	Details
Test (s3prl)	`0.8142`
Test (transformers)	`0.8071`

BibTeX entry and citation info

@article{yang2021superb,
  title={SUPERB: Speech processing Universal PERformance Benchmark},
  author={Yang, Shu-wen and Chi, Po-Han and Chuang, Yung-Sung and Lai, Cheng-I Jeff and Lakhotia, Kushal and Lin, Yist Y and Liu, Andy T and Shi, Jiatong and Chang, Xuankai and Lin, Guan-Ting and others},
  journal={arXiv preprint arXiv:2105.01051},
  year={2021}
}

🔧 Technical Details

The original document does not provide technical details, so this section is skipped.

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご