xlsr - wav2vec語音情感識別模型 - 開源免費識別憤怒、厭惡等五種情感

首頁

Xlsr Wav2vec Speech Emotion Recognition

由harshit345開發

基於XLSR-Wav2Vec架構的語音情感識別模型，能夠識別五種基本情感：憤怒、厭惡、恐懼、快樂和悲傷。

音頻分類

Transformers

英語開源協議:Apache-2.0 #語音情感識別 #多情感分類 #高精度音頻分析

下載量 498

發布時間 : 3/2/2022

模型概述

該模型使用Wav2Vec2架構進行語音情感分類，適用於從語音信號中識別說話者的情感狀態。

模型特點

多情感識別

能夠識別五種基本情感：憤怒、厭惡、恐懼、快樂和悲傷。

基於Wav2Vec2架構

利用Wav2Vec2的自監督學習能力，在語音情感識別任務上表現良好。

高準確率

在測試數據上整體準確率達到80.6%，各類情感識別表現均衡。

模型能力

語音情感分類

語音信號處理

情感概率評分

使用案例

人機交互

客服系統情感分析

分析客戶語音中的情感狀態，幫助客服系統做出更智能的響應。

可準確識別客戶憤怒、不滿等負面情緒

心理健康

情緒狀態監測

通過日常語音分析用戶的情緒變化。

可用於抑鬱症等心理疾病的輔助診斷

🚀 音頻情感識別項目

本項目專注於音頻情感識別，利用先進的深度學習技術對音頻中的情感進行分類，為音頻數據的情感分析提供了高效準確的解決方案。

🚀 快速開始

安裝依賴包

!pip install git+https://github.com/huggingface/datasets.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install torchaudio
!pip install librosa

💻 使用示例

基礎用法

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
import librosa
import IPython.display as ipd
import numpy as np
import pandas as pd

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name_or_path = "harshit345/xlsr-wav2vec-speech-emotion-recognition"
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate
model = Wav2Vec2ForSpeechClassification.from_pretrained(model_name_or_path).to(device)

def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech

def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}
    with torch.no_grad():
        logits = model(**inputs).logits
    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in enumerate(scores)]
    return outputs

高級用法

# 對指定音頻文件進行情感預測
path = '/data/jtes_v1.1/wav/f01/ang/f01_ang_01.wav'   
outputs = predict(path, sampling_rate)
print(outputs)

輸出結果：

[{'Emotion': 'anger', 'Score': '78.3%'},
 {'Emotion': 'disgust', 'Score': '11.7%'},
 {'Emotion': 'fear', 'Score': '5.4%'},
 {'Emotion': 'happiness', 'Score': '4.1%'},
 {'Emotion': 'sadness', 'Score': '0.5%'}]