hubert-large-turkish-speech-emotion-recognition開源模型

首頁

Hubert Large Turkish Speech Emotion Recognition

由SeaBenSea開發

基於HuBERT架構的土耳其語音情感識別模型，在TurEV-DB數據集上訓練，能夠識別憤怒、平靜、快樂和悲傷四種情感。

音頻分類

Transformers

其他開源協議:Apache-2.0 #土耳其語音情感識別 #HuBERT大模型 #高精度情感分類

下載量 95

發布時間 : 6/25/2024

模型概述

該模型使用HuBERT架構進行土耳其語的語音情感識別，主要功能是對輸入的土耳其語語音進行情感分類，支持四種基本情感識別。

模型特點

高準確率情感識別

在TurEV-DB數據集上達到95%的整體準確率，憤怒情感識別F1分數高達0.98

土耳其語專用

專門針對土耳其語語音優化的情感識別模型

多情感分類

能夠識別憤怒、平靜、快樂和悲傷四種基本情感

模型能力

土耳其語音情感識別

語音情感分類

語音信號處理

使用案例

情感分析

客服語音分析

分析客服通話中的客戶情感狀態

可識別客戶憤怒情緒，幫助改進服務質量

心理健康監測

通過語音分析用戶情緒狀態

可輔助抑鬱症等心理健康狀況的早期識別

🚀 使用HuBERT進行土耳其語語音情感識別

本項目利用基於TurEV-DB數據集訓練的HuBERT模型，實現了土耳其語語音情感識別（SER）。

🚀 快速開始

✨ 主要特性

基於HuBERT模型，在土耳其語語音情感識別任務上表現出色。
可準確識別憤怒、平靜、快樂和悲傷等多種情感。

📦 安裝指南

依賴包安裝

# 安裝所需的包
!pip install git+https://github.com/huggingface/datasets.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install torchaudio
!pip install librosa

克隆項目倉庫

!git clone https://github.com/SeaBenSea/HuBERT-SER.git

💻 使用示例

基礎用法

import sys  
sys.path.insert(1, './HuBERT-SER/')
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification

高級用法

model_name_or_path = "SeaBenSea/hubert-large-turkish-speech-emotion-recognition"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate

model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)

def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
               enumerate(scores)]
    return outputs

path = "../dataset/TurEV/Angry/1157_kz_acik.wav"
outputs = predict(path, sampling_rate)
outputs

預測結果示例

[
  {'Emotion': 'Angry', 'Score': '99.8%'},
  {'Emotion': 'Calm', 'Score': '0.0%'},
  {'Emotion': 'Happy', 'Score': '0.1%'},
  {'Emotion': 'Sad', 'Score': '0.1%'}
]