Open-source voice activity detection model - Accurately identify the time periods of voice activity in audio

Voice Activity Detection

Developed by pyannote

Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio

Speech Recognition Open Source License:MIT #Voice Activity Detection #Speaker Diarization #Overlapping Speech Processing

Downloads 7.7M

Release Time : 3/2/2022

Model Overview

This model is primarily used to detect speech activity in audio, accurately identifying the start and end times of speech segments, suitable for preprocessing steps in speech processing workflows

Model Features

High-Precision Speech Detection

Accurately detects speech activity segments in audio

End-to-End Processing

Provides a complete end-to-end voice activity detection solution

Easy Integration

Offers a simple Python interface for easy integration into existing systems

Model Capabilities

Voice Activity Detection

Audio Time Stamping

Speech/Non-Speech Classification

Use Cases

Speech Processing

Automatic Speech Recognition Preprocessing

Detects speech activity before ASR systems to improve recognition efficiency

Reduces processing overhead for non-speech segments

Meeting Transcript Analysis

Marks speech segments in meeting recordings

Facilitates subsequent speaker analysis and content extraction

🚀 Voice Activity Detection

This open - source model provides voice activity detection capabilities. It relies on pyannote.audio 2.1 and can be a great tool for audio processing tasks. If you plan to use it in production, consider switching to pyannoteAI for better and faster options.

🚀 Quick Start

Relies on pyannote.audio 2.1: see installation instructions.

💻 Usage Examples

Basic Usage

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained voice activity detection pipeline

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/voice-activity-detection",
                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")
output = pipeline("audio.wav")

for speech in output.get_timeline().support():
    # active speech between speech.start and speech.end
    ...

📄 License

This project is licensed under the MIT license.

📚 Documentation

The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.

Required Information

Property	Details
Company/university	text
Website	text
I plan to use this model for (task, type of audio data, etc)	text

📚 Citation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご