pyannote-segmentationオープンソース話者分割モデル - 音声アクティビティ、重複検出などのタスクを無料で実現

ホーム

Pyannote Segmentation

philschmidによって開発

これはエンドツーエンドの話者分割モデルで、音声活動検出、オーバーラップ音声検出、再分割タスクをサポートします。

話者の処理

PyTorch

オープンソースライセンス:MIT #オーバーラップ音声検出 #話者分割 #エンドツーエンド再分割

ダウンロード数 427

リリース時間 : 11/8/2022

モデル概要

このモデルは主に音声処理における話者分割タスクに使用され、音声活動を検出し、オーバーラップ音声領域を識別し、ベースライン分割結果を最適化して再分割することができます。

モデル特徴

エンドツーエンド話者分割

エンドツーエンドアーキテクチャを採用し、話者分割タスクを直接処理し、処理フローを簡素化します

オーバーラップ音声検出

音声中の複数の話者が同時に話しているオーバーラップ領域を正確に識別できます

再分割最適化

ベースライン分割結果を最適化し、分割精度を向上させることができます

マルチデータセット検証

AMI、DIHARD3、VoxConverseなどの複数の標準データセットで効果を検証しています

モデル能力

音声活動検出

オーバーラップ音声識別

話者分割最適化

音声特徴抽出

使用事例

会議記録

会議音声分割

会議録音中の異なる話者セグメントを自動分割します

AMIデータセットで有効性を検証

音声分析

オーバーラップ音声検出

会話中の複数人が同時に話している状況を識別します

DIHARD3データセットで有効性を検証

音声処理最適化

分割結果最適化

既存の音声分割結果を最適化して改善します

VoxConverseデータセットで有効性を検証

🚀 話者セグメンテーション

このモデルは、重複音声を考慮した話者再セグメンテーションを実現するエンドツーエンドの話者セグメンテーションモデルです。

Example

このモデルは、Hervé BredinとAntoine Laurentによる論文 End-to-end speaker segmentation for overlap-aware resegmentation から派生したものです。

オンラインデモは、Hugging Face Spaceとして利用可能です。

🚀 クイックスタート

このモデルは、開発中のpyannote.audio 2.0に依存しています。インストール手順を参照してください。

✨ 主な機能

音声活性検出
重複音声検出
再セグメンテーション

📦 インストール

このモデルは、開発中のpyannote.audio 2.0に依存しています。インストール手順を参照してください。

💻 使用例

基本的な使用法

音声活性検出

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
HYPER_PARAMETERS = {
  # onset/offset activation thresholds
  "onset": 0.5, "offset": 0.5,
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

重複音声検出

from pyannote.audio.pipelines import OverlappedSpeechDetection
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
pipeline.instantiate(HYPER_PARAMETERS)
osd = pipeline("audio.wav")
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions

再セグメンテーション

from pyannote.audio.pipelines import Resegmentation
pipeline = Resegmentation(segmentation="pyannote/segmentation", 
                          diarization="baseline")
pipeline.instantiate(HYPER_PARAMETERS)
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
# where `baseline` should be provided as a pyannote.core.Annotation instance

生のスコア取得

from pyannote.audio import Inference
inference = Inference("pyannote/segmentation")
segmentation = inference("audio.wav")
# `segmentation` is a pyannote.core.SlidingWindowFeature
# instance containing raw segmentation scores like the 
# one pictured above (output)

高度な使用法

論文 "End-to-end speaker segmentation for overlap-aware resegmentation" の結果を再現するには、以下のハイパーパラメータで pyannote/segmentation@Interspeech2021 を使用します。

音声活性検出	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.684	0.577	0.181	0.037
DIHARD3	0.767	0.377	0.136	0.067
VoxConverse	0.767	0.713	0.182	0.501

重複音声検出	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.448	0.362	0.116	0.187
DIHARD3	0.430	0.320	0.091	0.144
VoxConverse	0.587	0.426	0.337	0.112

VBxの再セグメンテーション	`onset`	`offset`	`min_duration_on`	`min_duration_off`
AMI Mix-Headset	0.542	0.527	0.044	0.705
DIHARD3	0.592	0.489	0.163	0.182
VoxConverse	0.537	0.724	0.410	0.563

予想される出力（およびVBxベースライン）は、/reproducible_research サブディレクトリにも提供されています。

📚 ドキュメント

サポート

商業的な問い合わせや科学的なコンサルティングについては、私に連絡してください。
技術的な質問やバグレポートについては、pyannote.audio のGitHubリポジトリを確認してください。

📄 ライセンス

このプロジェクトはMITライセンスの下で公開されています。

引用

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}