Overlapped-Speech-Detection Open Source Model - Accurately Detect Time Periods of Multiple People Speaking Simultaneously in Audio

Overlapped Speech Detection

Developed by pyannote

A pre-trained model for detecting overlapped speech in audio, capable of identifying time segments where two or more speakers are active simultaneously.

Speaker Analysis Open Source License:MIT #Overlapped speech detection #Speaker diarization #End-to-end model

Downloads 144.68k

Release Time : 3/2/2022

Model Overview

This model is primarily used to detect overlapped speech segments in audio, where two or more speakers are talking simultaneously. Suitable for speech processing, speaker diarization, and related tasks.

Model Features

Overlapped speech detection

Accurately identifies time segments where two or more speakers are active simultaneously in audio

End-to-end training

Uses end-to-end training to learn features directly from raw audio

Pre-trained model

Provides an out-of-the-box pre-trained model, eliminating the need for training from scratch

Model Capabilities

Overlapped speech detection

Speaker diarization

Audio timeline analysis

Use Cases

Speech processing

Meeting transcript analysis

Analyzes overlapped dialogue segments in meeting recordings to improve transcription accuracy

Can identify segments where multiple people speak simultaneously

Speaker diarization

Provides overlapped speech detection functionality for speaker diarization systems

Improves the accuracy of speaker segmentation

🚀 Overlapped speech detection

This open - source model is used for overlapped speech detection, relying on pyannote.audio 2.1. It provides a convenient way to detect overlapped speech in audio data.

🚀 Quick Start

Relies on pyannote.audio 2.1: see installation instructions.

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained overlapped speech detection pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/overlapped-speech-detection",
                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")
output = pipeline("audio.wav")

for speech in output.get_timeline().support():
    # two or more speakers are active between speech.start and speech.end
    ...

✨ Features

Tags: pyannote, pyannote - audio, pyannote - audio - pipeline, audio, voice, speech, speaker, overlapped - speech - detection, automatic - speech - recognition
Datasets: ami, dihard, voxconverse

Property	Details
Model Type	Overlapped speech detection model
Training Data	ami, dihard, voxconverse

⚠️ Important Note

The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.

💡 Usage Tip

Using this open - source model in production? Consider switching to pyannoteAI for better and faster options.

📚 Documentation

Support

For commercial enquiries and scientific consulting, please contact me.
For technical questions and bug reports, please check pyannote.audio Github repository.

📄 License

This project is licensed under the MIT license.

📚 Citation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご