hubert-large-turkish-speech-emotion-recognition开源模型

首页

Hubert Large Turkish Speech Emotion Recognition

由 SeaBenSea 开发

基于HuBERT架构的土耳其语音情感识别模型，在TurEV-DB数据集上训练，能够识别愤怒、平静、快乐和悲伤四种情感。

音频分类

Transformers

其他开源协议:Apache-2.0 #土耳其语音情感识别 #HuBERT大模型 #高精度情感分类

下载量 95

发布时间 : 6/25/2024

模型简介

该模型使用HuBERT架构进行土耳其语的语音情感识别，主要功能是对输入的土耳其语语音进行情感分类，支持四种基本情感识别。

模型特点

高准确率情感识别

在TurEV-DB数据集上达到95%的整体准确率，愤怒情感识别F1分数高达0.98

土耳其语专用

专门针对土耳其语语音优化的情感识别模型

多情感分类

能够识别愤怒、平静、快乐和悲伤四种基本情感

模型能力

土耳其语音情感识别

语音情感分类

语音信号处理

使用案例

情感分析

客服语音分析

分析客服通话中的客户情感状态

可识别客户愤怒情绪，帮助改进服务质量

心理健康监测

通过语音分析用户情绪状态

可辅助抑郁症等心理健康状况的早期识别

🚀 使用HuBERT进行土耳其语语音情感识别

本项目利用基于TurEV-DB数据集训练的HuBERT模型，实现了土耳其语语音情感识别（SER）。

🚀 快速开始

✨ 主要特性

基于HuBERT模型，在土耳其语语音情感识别任务上表现出色。
可准确识别愤怒、平静、快乐和悲伤等多种情感。

📦 安装指南

依赖包安装

# 安装所需的包
!pip install git+https://github.com/huggingface/datasets.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install torchaudio
!pip install librosa

克隆项目仓库

!git clone https://github.com/SeaBenSea/HuBERT-SER.git

💻 使用示例

基础用法

import sys  
sys.path.insert(1, './HuBERT-SER/')
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification

高级用法

model_name_or_path = "SeaBenSea/hubert-large-turkish-speech-emotion-recognition"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate

model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)

def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
               enumerate(scores)]
    return outputs

path = "../dataset/TurEV/Angry/1157_kz_acik.wav"
outputs = predict(path, sampling_rate)
outputs

预测结果示例

[
  {'Emotion': 'Angry', 'Score': '99.8%'},
  {'Emotion': 'Calm', 'Score': '0.0%'},
  {'Emotion': 'Happy', 'Score': '0.1%'},
  {'Emotion': 'Sad', 'Score': '0.1%'}
]