開源segmentation音頻處理模型 - 支持語音活動、重疊和說話人分割檢測

首頁

Segmentation

由pyannote開發

一個用於語音活動檢測、重疊語音檢測和說話人分割的音頻處理模型

說話人處理

PyTorch

開源協議:MIT #說話人分割 #重疊語音檢測 #語音活動檢測

下載量 9.2M

發布時間 : 3/2/2022

模型概述

該模型主要用於處理音頻中的說話人分割任務，包括語音活動檢測(VAD)、重疊語音檢測(OSD)以及說話人重分割。它能夠識別音頻中的語音區域、檢測重疊的語音部分，並對說話人分割結果進行優化。

模型特點

端到端說話人分割

提供完整的端到端解決方案，可直接處理原始音頻輸入並輸出分割結果

重疊語音檢測

能夠準確識別音頻中多個說話人同時說話的重疊區域

可調節參數

提供多種可調節參數，如激活閾值、最小持續時間等，以適應不同應用場景

多任務支持

支持語音活動檢測、重疊語音檢測和重分割等多種相關任務

模型能力

語音活動檢測

重疊語音檢測

說話人分割

音頻處理

說話人日誌

使用案例

會議記錄

會議錄音分析

自動識別會議錄音中不同發言人的語音區域

提高會議記錄和轉錄的準確性

語音分析

重疊語音檢測

檢測對話中多個說話人同時說話的情況

有助於理解複雜的對話場景

語音處理

說話人分割優化

對現有的說話人分割結果進行優化處理

提高分割精度和準確性

🚀 說話人分割模型

本模型主要用於音頻中的說話人分割任務，可實現語音活動檢測、重疊語音檢測和重新分割等功能，為音頻處理和分析提供了強大的支持。

🚀 快速開始

如果你計劃在生產環境中使用此開源模型，建議考慮切換到 pyannoteAI，以獲得更好更快的選擇。

✨ 主要特性

多任務支持：可進行語音活動檢測、重疊語音檢測和重新分割等任務。
可視化示例：提供了示例圖片展示分割效果。
可復現研究：提供了論文結果復現所需的超參數。

📦 安裝指南

本模型依賴於 pyannote.audio 2.1.1，具體安裝說明請參考安裝指南。

💻 使用示例

基礎用法

# 1. 訪問 hf.co/pyannote/segmentation 並接受用戶條件
# 2. 訪問 hf.co/settings/tokens 創建訪問令牌
# 3. 實例化預訓練模型
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/segmentation", 
                              use_auth_token="ACCESS_TOKEN_GOES_HERE")

語音活動檢測

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # 起始/結束激活閾值
  "onset": 0.5, "offset": 0.5,
  # 移除短於該秒數的語音區域
  "min_duration_on": 0.0,
  # 填充短於該秒數的非語音區域
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` 是一個包含語音區域的 pyannote.core.Annotation 實例

重疊語音檢測

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation=model)
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` 是一個包含重疊語音區域的 pyannote.core.Annotation 實例

重新分割

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation=model, 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# 其中 `baseline` 應作為 pyannote.core.Annotation 實例提供

原始分數

from pyannote.audio import Inference
inference = Inference(model)
segmentation = inference("audio.wav")
# `segmentation` 是一個 pyannote.core.SlidingWindowFeature 實例，包含如上圖所示的原始分割分數（輸出）

📚 詳細文檔

論文：End-to-end speaker segmentation for overlap-aware resegmentation
演示：Demo
博客文章：One-speaker segmentation model to rule them all

📄 許可證

本項目採用 MIT 許可證。

🔗 引用信息

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}