开源segmentation音频处理模型 - 支持语音活动、重叠和说话人分割检测

Home

Segmentation

Developed by pyannote

一个用于语音活动检测、重叠语音检测和说话人分割的音频处理模型

说话人处理

PyTorch

Open Source License:MIT #说话人分割 #重叠语音检测 #语音活动检测

Downloads 9.2M

Release Time : 3/2/2022

Model Overview

该模型主要用于处理音频中的说话人分割任务，包括语音活动检测(VAD)、重叠语音检测(OSD)以及说话人重分割。它能够识别音频中的语音区域、检测重叠的语音部分，并对说话人分割结果进行优化。

Model Features

端到端说话人分割

提供完整的端到端解决方案，可直接处理原始音频输入并输出分割结果

重叠语音检测

能够准确识别音频中多个说话人同时说话的重叠区域

可调节参数

提供多种可调节参数，如激活阈值、最小持续时间等，以适应不同应用场景

多任务支持

支持语音活动检测、重叠语音检测和重分割等多种相关任务

Model Capabilities

语音活动检测

重叠语音检测

说话人分割

音频处理

说话人日志

Use Cases

会议记录

会议录音分析

自动识别会议录音中不同发言人的语音区域

提高会议记录和转录的准确性

语音分析

重叠语音检测

检测对话中多个说话人同时说话的情况

有助于理解复杂的对话场景

语音处理

说话人分割优化

对现有的说话人分割结果进行优化处理

提高分割精度和准确性

🚀 说话人分割模型

本模型主要用于音频中的说话人分割任务，可实现语音活动检测、重叠语音检测和重新分割等功能，为音频处理和分析提供了强大的支持。

🚀 快速开始

如果你计划在生产环境中使用此开源模型，建议考虑切换到 pyannoteAI，以获得更好更快的选择。

✨ 主要特性

多任务支持：可进行语音活动检测、重叠语音检测和重新分割等任务。
可视化示例：提供了示例图片展示分割效果。
可复现研究：提供了论文结果复现所需的超参数。

📦 安装指南

本模型依赖于 pyannote.audio 2.1.1，具体安装说明请参考安装指南。

💻 使用示例

基础用法

# 1. 访问 hf.co/pyannote/segmentation 并接受用户条件
# 2. 访问 hf.co/settings/tokens 创建访问令牌
# 3. 实例化预训练模型
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/segmentation", 
                              use_auth_token="ACCESS_TOKEN_GOES_HERE")

语音活动检测

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # 起始/结束激活阈值
  "onset": 0.5, "offset": 0.5,
  # 移除短于该秒数的语音区域
  "min_duration_on": 0.0,
  # 填充短于该秒数的非语音区域
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` 是一个包含语音区域的 pyannote.core.Annotation 实例

重叠语音检测

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation=model)
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` 是一个包含重叠语音区域的 pyannote.core.Annotation 实例

重新分割

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation=model, 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# 其中 `baseline` 应作为 pyannote.core.Annotation 实例提供

原始分数

from pyannote.audio import Inference
inference = Inference(model)
segmentation = inference("audio.wav")
# `segmentation` 是一个 pyannote.core.SlidingWindowFeature 实例，包含如上图所示的原始分割分数（输出）

📚 详细文档

论文：End-to-end speaker segmentation for overlap-aware resegmentation
演示：Demo
博客文章：One-speaker segmentation model to rule them all

📄 许可证

本项目采用 MIT 许可证。

🔗 引用信息

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}