🚀 "Powerset" Speaker Segmentation
This open - source model is designed for speaker segmentation. It takes in 10 - second mono audio sampled at 16kHz and outputs speaker diarization. The output is a (num_frames, num_classes) matrix, where the 7 classes represent different speaking states, such as non - speech, single speakers, and combinations of multiple speakers.
🚀 Quick Start
Using this open - source model in production? Consider switching to pyannoteAI for better and faster options.
✨ Features
- Ingests 10 seconds of 16kHz mono audio.
- Outputs speaker diarization in a multi - class matrix.
- Applicable for various speaker - related detection tasks like speaker diarization, voice activity detection, and overlapped speech detection.
📦 Installation
- Install [
pyannote.audio
](https://github.com/pyannote/pyannote - audio) 3.0
with pip install pyannote.audio
.
- Accept [
pyannote/segmentation - 3.0
](https://hf.co/pyannote/segmentation - 3.0) user conditions.
- Create access token at
hf.co/settings/tokens
.
💻 Usage Examples
Basic Usage
duration, sample_rate, num_channels = 10, 16000, 1
waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
powerset_encoding = model(waveform)
from pyannote.audio.utils.powerset import Powerset
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
to_multilabel = Powerset(
max_speakers_per_chunk,
max_speakers_per_frame).to_multilabel
multilabel_encoding = to_multilabel(powerset_encoding)
Advanced Usage
Speaker diarization
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks). See [pyannote/speaker - diarization - 3.0](https://hf.co/pyannote/speaker - diarization - 3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
Voice activity detection
from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
"min_duration_on": 0.0,
"min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
Overlapped speech detection
from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation=model)
HYPER_PARAMETERS = {
"min_duration_on": 0.0,
"min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
📚 Documentation
The various concepts behind this model are described in details in this [paper](https://www.isca - speech.org/archive/interspeech_2023/plaquet23_interspeech.html).
It has been trained by Séverin Baroudi with [pyannote.audio](https://github.com/pyannote/pyannote - audio) 3.0.0
using the combination of the training sets of AISHELL, AliMeeting, AMI, AVA - AVD, DIHARD, Ego4D, MSDWild, REPERE, and VoxConverse.
This [companion repository](https://github.com/FrenchKrab/IS2023 - powerset - diarization/) by Alexis Plaquet also provides instructions on how to train or finetune such a model on your own data.
📄 License
This model is licensed under the MIT license.
Citations
@inproceedings{Plaquet23,
author={Alexis Plaquet and Hervé Bredin},
title={{Powerset multi - class cross entropy loss for neural speaker diarization}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
author={Hervé Bredin},
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}

⚠️ Important Note
The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though this model uses MIT license and will always remain open - source, we will occasionnally email you about premium models and paid services around pyannote.
Property |
Details |
Tags |
pyannote, pyannote - audio, pyannote - audio - model, audio, voice, speech, speaker, speaker - diarization, speaker - change - detection, speaker - segmentation, voice - activity - detection, overlapped - speech - detection, resegmentation |
License |
MIT |
Inference |
false |
Extra Gated Prompt |
The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though this model uses MIT license and will always remain open - source, we will occasionnally email you about premium models and paid services around pyannote. |
Extra Gated Fields |
Company/university: text; Website: text |