开源Segmentation说话人分割模型 - 免费用于语音活动与重叠语音检测等任务

首页

Segmentation

由 salmanshahid 开发

这是一个端到端的说话人分割模型，用于语音活动检测、重叠语音检测和重分割任务。

说话人处理

TensorBoard

开源协议:MIT #重叠语音检测 #端到端分割 #说话人重分割

下载量 1,790

发布时间 : 11/16/2024

模型简介

该模型主要用于处理音频中的说话人分割问题，能够检测语音活动、识别重叠语音，并支持对说话人分割结果进行优化。

模型特点

端到端说话人分割

采用端到端方法处理说话人分割问题，简化了传统流程

重叠语音检测

能够识别音频中重叠的说话人语音

重分割优化

可以对现有的说话人分割结果进行优化改进

多数据集训练

在AMI、DIHARD3和VoxConverse等多个数据集上进行训练

模型能力

语音活动检测

重叠语音检测

说话人分割优化

音频分析

使用案例

语音分析

会议记录分析

用于分析会议录音中的说话人切换和重叠语音

可准确识别不同说话人的语音段

语音转写预处理

为语音识别系统提供更准确的说话人分割结果

提高转写系统的说话人区分能力

音频处理

音频编辑辅助

帮助音频编辑人员快速定位不同说话人的语音段

提高音频编辑效率

🚀 pyannote.audio // 说话人分割

本项目是一个用于说话人分割的模型，能够进行语音活动检测、重叠语音检测和重新分割等任务。它基于论文提出的方法，依赖于正在开发中的 pyannote.audio 2.0 版本。

此模型来自论文 End-to-end speaker segmentation for overlap-aware resegmentation，由 Hervé Bredin 和 Antoine Laurent 完成。

该模型依赖于目前正在开发中的 pyannote.audio 2.0：请查看安装说明。

🚀 快速开始

本模型可用于语音活动检测、重叠语音检测、重新分割等任务，具体使用方法见下文。

✨ 主要特性

多任务支持：支持语音活动检测、重叠语音检测和重新分割等多种任务。
基于论文方法：模型基于特定论文提出的端到端说话人分割方法。
依赖开发版本：依赖于正在开发中的 pyannote.audio 2.0 版本。

📦 安装指南

依赖于目前正在开发中的 pyannote.audio 2.0，安装说明请参考：安装说明。

💻 使用示例

基础用法

语音活动检测

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
HYPER_PARAMETERS = {
  # onset/offset activation thresholds
  "onset": 0.5, "offset": 0.5,
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

重叠语音检测

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions

重新分割

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation="pyannote/segmentation", 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# where `baseline` should be provided as a pyannote.core.Annotation instance

原始分数

from pyannote.audio import Inference
inference = Inference("pyannote/segmentation")
segmentation = inference("audio.wav")
# `segmentation` is a pyannote.core.SlidingWindowFeature
# instance containing raw segmentation scores like the 
# one pictured above (output)

高级用法

为了复现论文 "End-to-end speaker segmentation for overlap-aware resegmentation" 的结果，可使用以下超参数：

任务	数据集	`onset`	`offset`	`min_duration_on`	`min_duration_off`
语音活动检测	AMI Mix-Headset	0.684	0.577	0.181	0.037
语音活动检测	DIHARD3	0.767	0.377	0.136	0.067
语音活动检测	VoxConverse	0.767	0.713	0.182	0.501
重叠语音检测	AMI Mix-Headset	0.448	0.362	0.116	0.187
重叠语音检测	DIHARD3	0.430	0.320	0.091	0.144
重叠语音检测	VoxConverse	0.587	0.426	0.337	0.112
VBx 重新分割	AMI Mix-Headset	0.542	0.527	0.044	0.705
VBx 重新分割	DIHARD3	0.592	0.489	0.163	0.182
VBx 重新分割	VoxConverse	0.537	0.724	0.410	0.563

预期输出（和 VBx 基线）也在 /reproducible_research 子目录中提供。

📚 详细文档

支持信息

商业咨询和科学咨询：请联系我。
技术问题和错误报告：请查看 pyannote.audio 的 Github 仓库，技术问题讨论和错误报告都可在此进行。

📄 许可证

本项目采用 MIT 许可证。

📚 引用

如果您使用了本项目，请引用以下论文：

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\\\\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}