Open-source speaker segmentation model - Detect speaker changes and voice activities in audio

Speaker Segmentation

Developed by pyannote

Speaker segmentation model based on pyannote.audio, used to detect speaker changes and speech activity in audio

Audio Processing Open Source License:MIT #Speaker segmentation #Overlapping speech detection #End-to-end model

Downloads 182

Release Time : 3/2/2022

Model Overview

This model focuses on the speaker segmentation task, capable of identifying speech segments from different speakers in audio, but does not handle speaker diarization.

Model Features

End-to-end speaker segmentation

Supports end-to-end speaker segmentation processing, capable of identifying speaker changes in audio

Overlapping speech detection

Capable of detecting overlapping speech segments

Speech activity detection

Can identify speech activity regions in audio

Model Capabilities

Speaker change detection

Speech activity detection

Overlapping speech detection

Audio segmentation

Use Cases

Speech analysis

Meeting recording analysis

Analyze segments from different speakers in meeting recordings

Accurately segments speech from different speakers

Interview transcription

Segment speech from different speakers in interview recordings

Facilitates subsequent transcription and content analysis

🚀 Speaker Segmentation

This open - source model focuses on speaker segmentation, leveraging the power of pyannote.audio 2.1. It offers effective solutions for tasks related to audio processing such as speaker segmentation, speaker diarization, and more. If you're using this model in production, consider pyannoteAI for better and faster options.

🚀 Quick Start

Relies on pyannote.audio 2.1: see installation instructions.

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained speaker segmentation pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-segmentation")
output = pipeline("audio.wav")

for turn, _, speaker in output.itertracks(yield_label=True):
    # speaker speaks between turn.start and turn.end
    ...

⚠️ Important Note

This pipeline does not address speaker diarization.

📚 Documentation

Datasets

ami
dihard
voxconverse

Gated Access

The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.

Property	Details
Company/university	text
Website	text
I plan to use this model for (task, type of audio data, etc)	text

📄 License

This project is licensed under the MIT license.

Support

For commercial enquiries and scientific consulting, please contact me.
For technical questions and bug reports, please check pyannote.audio Github repository.

Citation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご