Brouhaha Open-Source Multi-Task Model - Easily Achieve Voice Activity Detection, Signal-to-Noise Ratio, and Acoustic Parameter Estimation

Brouhaha

Developed by pyannote

A multi-task model for joint voice activity detection, speech signal-to-noise ratio, and C50 room acoustic parameter estimation

Speech Recognition

PyTorch

Open Source License:Openrail #Joint Voice Activity Detection #Acoustic Parameter Estimation #Multi-Task Learning

Downloads 142.46k

Release Time : 10/28/2022

Model Overview

This model can simultaneously perform voice activity detection (VAD), estimate speech signal-to-noise ratio (SNR), and C50 room acoustic parameters, suitable for audio processing and environmental acoustic analysis.

Model Features

Multi-Task Joint Training

Simultaneously handles voice activity detection, SNR estimation, and room acoustic parameter estimation

Real-Time Processing Capability

Capable of frame-by-frame audio analysis, providing real-time detection and estimation results

Broad Applicability

Suitable for various speech environments and acoustic scenarios

Model Capabilities

Voice Activity Detection

SNR Estimation

Room Acoustic Analysis

Audio Environment Evaluation

Use Cases

Speech Processing

Meeting Recording Enhancement

Identify valid speech and optimize recording quality

Improves speech recognition accuracy

Acoustic Environment Assessment

Evaluate room acoustic characteristics

Optimizes audio system configuration

Audio Analysis

Speech Quality Monitoring

Real-time monitoring of speech signal quality

Timely detection of audio quality issues

🚀 Brouhaha

Brouhaha is a model that jointly performs voice activity detection, speech - to - noise ratio, and C50 room acoustics estimation, which is significant for audio analysis and related research.

🚀 Quick Start

This model jointly performs voice activity detection, speech - to - noise ratio, and C50 room acoustics estimation. You can quickly understand its highlights through TL;DR, read the Paper, check the Code, or enjoy And Now for Something Completely Different.

Sample Brouhaha predictions

✨ Features

Joint voice activity detection, speech - to - noise ratio, and C50 room acoustics estimation.

📦 Installation

This model relies on pyannote.audio and brouhaha - vad.

pip install pyannote-audio
pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip

💻 Usage Examples

Basic Usage

# 1. visit hf.co/pyannote/brouhaha and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained model
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/brouhaha", 
                              use_auth_token="ACCESS_TOKEN_GOES_HERE")

# apply model 
from pyannote.audio import Inference
inference = Inference(model)
output = inference("audio.wav")

# iterate over each frame
for frame, (vad, snr, c50) in output:
    t = frame.middle
    print(f"{t:8.3f} vad={100*vad:.0f}% snr={snr:.0f} c50={c50:.0f}")

#  ...
# 12.952 vad=100% snr=51 c50=17
# 12.968 vad=100% snr=52 c50=17
# 12.985 vad=100% snr=53 c50=17
# ...

📄 License

This model is under the openrail license. The collected information will help acquire a better knowledge of this model userbase and help its maintainers apply for grants to improve it further.

Additional gated fields

Property	Details
Company/university	text
Website	text
I plan to use this model for (task, type of audio data, etc)	text

📚 Documentation

Datasets

LibriSpeech
AudioSet
EchoThief
MIT - Acoustical - Reverberation - Scene

Citation

@article{lavechin2022brouhaha,
  Title   = {{Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation}},
  Author  = {Marvin Lavechin and Marianne Métais and Hadrien Titeux and Alodie Boissonnet and Jade Copet and Morgane Rivière and Elika Bergelson and Alejandrina Cristia and Emmanuel Dupoux and Hervé Bredin},
  Year    = {2022},
  Journal = {arXiv preprint arXiv: Arxiv-2210.13248}
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご