VAD Open-source Voice Activity Detection Model - Freely Use and Accurately Identify Effective Voice Segments in Audio

Vad

Developed by salmanshahid

A voice activity detection model based on pyannote.audio, used to identify active speech segments in audio

Speech Recognition Open Source License:MIT #Voice Activity Detection #End-to-End Segmentation #Meeting Scenario Optimization

Downloads 1,794

Release Time : 11/16/2024

Model Overview

This model is primarily used to detect voice activity in audio, accurately identifying the start and end points of speech segments. It is suitable for scenarios such as meeting recordings and speech analysis.

Model Features

High-Precision Speech Segment Detection

Accurately identifies active speech segments in audio, including start and end points

End-to-End Processing

Utilizes an end-to-end neural network architecture to simplify the processing flow

Meeting Scenario Optimization

Performs well on meeting scenario datasets such as the AMI Meeting Corpus

Model Capabilities

Voice Activity Detection

Speech Segment Time Marking

Meeting Audio Analysis

Use Cases

Meeting Recording

Meeting Speech Segmentation

Automatically detects speech segments in meeting recordings for subsequent analysis and transcription

Accurately marks the speech time segments of each speaker

Speech Analysis

Voice Activity Statistics

Analyzes the time distribution of voice activity in audio

Provides time distribution data of voice activity

🚀 Voice activity detection

This open - source model for voice activity detection is based on pyannote.audio, which can effectively detect active speech segments in audio.

🚀 Quick Start

Using this open - source model in production?
Consider switching to pyannoteAI for better and faster options.

This model relies on pyannote.audio 2.1. See installation instructions.

💻 Usage Examples

Basic Usage

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained voice activity detection pipeline

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/voice-activity-detection",
                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")
output = pipeline("audio.wav")

for speech in output.get_timeline().support():
    # active speech between speech.start and speech.end
    ...

📄 License

This project is licensed under the MIT license.

Gated Access Details

The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening.

Required information for gated access:

Company/university
Website
I plan to use this model for (task, type of audio data, etc)

📚 Documentation

Citation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

📦 Related Datasets and Tags

Datasets

ami
dihard
voxconverse

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご