emotion-diarization-wavlm-large開源模型 - 支持多情感分類的語音識別與說話人分析

首頁

Emotion Diarization Wavlm Large

由speechbrain開發

使用WavLM Large模型進行微調，用於語音情感識別和說話人日誌分析，支持多種情感分類

音頻分類

PyTorch

英語開源協議:Apache-2.0 #語音情感識別 #多情感數據集 #情感時間邊界檢測

下載量 1,128

發布時間 : 7/4/2023

模型概述

該模型通過微調WavLM Large架構，能夠在語音記錄中識別情感成分並確定其時間邊界，適用於情感分析和說話人日誌任務。

模型特點

多情感數據集訓練

模型在五大情感數據集(Zaion、IEMOCAP、RAVDESS等)上進行訓練，具有廣泛的情感識別能力

時間邊界檢測

不僅能識別情感類型，還能準確確定情感片段的時間邊界

高精度情感分類

在Zaion測試集上達到29.7%的情感日誌錯誤率(EDER)

模型能力

語音情感識別

說話人日誌分析

情感時間邊界檢測

多情感分類

使用案例

情感分析

客服對話分析

分析客服對話中的客戶情感變化

識別憤怒、高興等關鍵情感節點

心理狀態評估

通過語音分析評估說話人的心理狀態

檢測抑鬱、焦慮等情緒特徵

媒體分析

影視情感分析

分析影視作品中的角色情感變化

生成情感時間線輔助內容分析

🚀 基於WavLM Large在5個流行情感數據集上進行情感分割

本倉庫提供了使用SpeechBrain通過微調wavlm（大模型）進行語音情感分割所需的所有工具。

該模型在拼接音頻上進行訓練，並在ZaionEmotionDataset上進行測試。評估指標為情感分割錯誤率（EDER）。更多詳細信息請查看論文鏈接。

為了獲得更好的體驗，我們建議您進一步瞭解SpeechBrain。該模型在ZED（測試集）上的性能如下：

版本	EDER(%)
05 - 07 - 23	29.7（平均：30.2）

🚀 快速開始

本系統由wavlm編碼器和下游逐幀分類器組成。其任務是預測語音記錄中正確的情感成分及其邊界。目前，該模型使用僅包含1個非中性情感事件的音頻進行訓練。

系統使用採樣率為16kHz（單聲道）的錄音進行訓練。當調用diarize_file時，代碼將根據需要自動對音頻進行歸一化處理（即重採樣 + 單聲道選擇）。

✨ 主要特性

提供基於微調wavlm（大模型）的語音情感分割工具。
支持在多個流行情感數據集上進行訓練和測試。
採用情感分割錯誤率（EDER）作為評估指標。

📦 安裝指南

安裝SpeechBrain

首先，請使用以下命令安裝SpeechBrain的開發版本：

git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .

請注意，我們建議您閱讀我們的教程並進一步瞭解SpeechBrain。

💻 使用示例

基礎用法

from speechbrain.inference.diarization import Speech_Emotion_Diarization
classifier = Speech_Emotion_Diarization.from_hparams(
    source="speechbrain/emotion-diarization-wavlm-large"
)
diary = classifier.diarize_file("speechbrain/emotion-diarization-wavlm-large/example.wav")
print(diary)

# {
#    'speechbrain/emotion-diarization-wavlm-large/example.wav':
#       [
#          {'start': 0.0, 'end': 1.94, 'emotion': 'n'}, # n -> neutral
#          {'start': 1.94, 'end': 4.48, 'emotion': 'h'} # h -> happy
#       ]
# }

diary = classifier.diarize_file("speechbrain/emotion-diarization-wavlm-large/example_sad.wav")
print(diary)

# {
#    'speechbrain/emotion-diarization-wavlm-large/example_sad.wav':
#        [
#          {'start': 0.0, 'end': 3.54, 'emotion': 's'}, # s -> sad
#          {'start': 3.54, 'end': 5.26, 'emotion': 'n'} # n -> neutral
#        ]
# }

輸出將包含一個情感成分及其邊界的字典。

高級用法

在GPU上進行推理

要在GPU上進行推理，請在調用from_hparams方法時添加 run_opts={"device":"cuda"}。

訓練

該模型使用SpeechBrain（aa018540）進行訓練。要從頭開始訓練，請按照以下步驟操作：

克隆SpeechBrain：

git clone https://github.com/speechbrain/speechbrain/

安裝它：

cd speechbrain
pip install -r requirements.txt
pip install -e .

運行訓練：

cd  recipes/ZaionEmotionDataset/emotion_diarization
python train.py hparams/train.yaml --zed_folder /path/to/ZED --emovdb_folder /path/to/EmoV-DB --esd_folder /path/to/ESD --iemocap_folder /path/to/IEMOCAP --jlcorpus_folder /path/to/JL_corpus --ravdess_folder /path/to/RAVDESS

您可以在此處找到我們的訓練結果（模型、日誌等）。

侷限性

SpeechBrain團隊不對該模型在其他數據集上的性能提供任何保證。

📚 詳細文檔

關於語音情感分割/Zaion情感數據集

@article{wang2023speech,
  title={Speech Emotion Diarization: Which Emotion Appears When?},
  author={Wang, Yingzhi and Ravanelli, Mirco and Nfissi, Alaa and Yacoubi, Alya},
  journal={arXiv preprint arXiv:2306.12991},
  year={2023}
}

引用SpeechBrain

如果您將SpeechBrain用於研究或商業用途，請引用它。

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

關於SpeechBrain

網站：https://speechbrain.github.io/
代碼：https://github.com/speechbrain/speechbrain/
HuggingFace：https://huggingface.co/speechbrain/

📄 許可證

本項目採用Apache 2.0許可證。

屬性	詳情
模型類型	基於微調wavlm（大模型）的語音情感分割模型
訓練數據	ZaionEmotionDataset、iemocap、ravdess、jl - corpus、esd、emov - db
評估指標	Emotion Diarization Error Rate (EDER)
許可證	apache - 2.0