Open-source Segmentation Speaker Diarization Model - Free for tasks such as Voice Activity Detection and Overlapping Speech Detection

Segmentation

Developed by salmanshahid

This is an end-to-end speaker segmentation model for voice activity detection, overlap speech detection, and resegmentation tasks.

Audio Processing

TensorBoard

Open Source License:MIT #Overlap Speech Detection #End-to-End Segmentation #Speaker Resegmentation

Downloads 1,790

Release Time : 11/16/2024

Model Overview

This model is primarily used to address speaker segmentation in audio, capable of detecting voice activity, identifying overlapping speech, and optimizing speaker segmentation results.

Model Features

End-to-End Speaker Segmentation

Uses an end-to-end approach for speaker segmentation, simplifying traditional workflows

Overlap Speech Detection

Capable of identifying overlapping speech from multiple speakers in audio

Resegmentation Optimization

Can optimize and improve existing speaker segmentation results

Multi-Dataset Training

Trained on multiple datasets including AMI, DIHARD3, and VoxConverse

Model Capabilities

Voice Activity Detection

Overlap Speech Detection

Speaker Segmentation Optimization

Audio Analysis

Use Cases

Speech Analysis

Meeting Transcription Analysis

Used to analyze speaker turns and overlapping speech in meeting recordings

Accurately identifies speech segments from different speakers

Speech Transcription Preprocessing

Provides more accurate speaker segmentation results for speech recognition systems

Enhances speaker differentiation capability in transcription systems

Audio Processing

Audio Editing Assistance

Helps audio editors quickly locate speech segments from different speakers

Improves audio editing efficiency

🚀 pyannote.audio // speaker segmentation

This project offers a model for speaker segmentation, which can be used for voice activity detection, overlapped speech detection, and resegmentation. It is based on the research by Hervé Bredin and Antoine Laurent, and relies on the in - development pyannote.audio 2.0.

Example

Model from End-to-end speaker segmentation for overlap-aware resegmentation,
by Hervé Bredin and Antoine Laurent.

Relies on pyannote.audio 2.0 currently in development: see installation instructions.

🚀 Quick Start

This model can be used for various audio processing tasks such as voice activity detection, overlapped speech detection, and resegmentation. For detailed usage, please refer to the "Usage" section below.

✨ Features

Voice activity detection
Overlapped speech detection
Resegmentation
Raw scores extraction

📦 Installation

Relies on pyannote.audio 2.0 currently in development: see installation instructions.

💻 Usage Examples

Basic Usage

Voice activity detection

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
HYPER_PARAMETERS = {
  # onset/offset activation thresholds
  "onset": 0.5, "offset": 0.5,
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

Overlapped speech detection

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions

Resegmentation

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation="pyannote/segmentation", 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# where `baseline` should be provided as a pyannote.core.Annotation instance

Raw scores

from pyannote.audio import Inference
inference = Inference("pyannote/segmentation")
segmentation = inference("audio.wav")
# `segmentation` is a pyannote.core.SlidingWindowFeature
# instance containing raw segmentation scores like the 
# one pictured above (output)

Advanced Usage

In order to reproduce the results of the paper "End-to-end speaker segmentation for overlap-aware resegmentation", use the following hyper - parameters:

Task	Dataset	`onset`	`offset`	`min_duration_on`	`min_duration_off`
Voice activity detection	AMI Mix-Headset	0.684	0.577	0.181	0.037
Voice activity detection	DIHARD3	0.767	0.377	0.136	0.067
Voice activity detection	VoxConverse	0.767	0.713	0.182	0.501
Overlapped speech detection	AMI Mix-Headset	0.448	0.362	0.116	0.187
Overlapped speech detection	DIHARD3	0.430	0.320	0.091	0.144
Overlapped speech detection	VoxConverse	0.587	0.426	0.337	0.112
Resegmentation of VBx	AMI Mix-Headset	0.542	0.527	0.044	0.705
Resegmentation of VBx	DIHARD3	0.592	0.489	0.163	0.182
Resegmentation of VBx	VoxConverse	0.537	0.724	0.410	0.563

Expected outputs (and VBx baseline) are also provided in the /reproducible_research sub - directories.

📚 Documentation

Support

For commercial enquiries and scientific consulting, please contact me.
For technical questions and bug reports, please check pyannote.audio Github repository.

📄 License

This project is licensed under the MIT license.

📚 Citation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\\\\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご