开源wav2vec2-xlsr-53-russian-emotion-recognition模型

首页

Wav2vec2 Xlsr 53 Russian Emotion Recognition

由 Aniemore 开发

这是一个基于XLS-R Wav2Vec2架构的俄语语音情感识别模型，能够识别7种基本情感，准确率达72%。

音频分类

Transformers

其他开源协议:MIT #俄语语音情感识别 #多情感分类 #Wav2Vec2架构

下载量 1,106

发布时间 : 5/22/2022

模型简介

该模型专门用于俄语语音中的情感识别，能够分析音频文件并识别愤怒、厌恶、兴奋、恐惧、快乐、中性和悲伤等情感。

模型特点

高精度情感识别

在俄语情感语音数据集上达到72%的准确率

多情感分类

能够识别7种不同的情感状态

基于Wav2Vec2架构

利用先进的语音表示学习技术

模型能力

俄语语音情感识别

音频情感分类

语音情感分析

使用案例

人机交互

客服情绪分析

分析客户服务通话中的客户情绪

可识别客户不满情绪，提高服务质量

心理健康

情绪状态监测

通过语音分析用户情绪状态

可用于心理健康应用的情绪监测

🚀 XLS - R Wav2Vec2 用于俄语语音情感分类

本项目提供了一个基于 XLS - R Wav2Vec2 的模型，可用于俄语语音的情感分类，能识别愤怒、厌恶、热情等多种情感。

🚀 快速开始

准备与导入

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, AutoModel, Wav2Vec2FeatureExtractor

import librosa
import numpy as np


def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model_(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in enumerate(scores)]
    return outputs

模型加载

TRUST = True

config = AutoConfig.from_pretrained('Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition', trust_remote_code=TRUST)
model_ = AutoModel.from_pretrained("Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition", trust_remote_code=TRUST)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_.to(device)

💻 使用示例

基础用法

result = predict("/path/to/russian_audio_speech.wav", 16000)
print(result)

输出示例

# outputs
[{'Emotion': 'anger', 'Score': '0.0%'},
 {'Emotion': 'disgust', 'Score': '100.0%'},
 {'Emotion': 'enthusiasm', 'Score': '0.0%'},
 {'Emotion': 'fear', 'Score': '0.0%'},
 {'Emotion': 'happiness', 'Score': '0.0%'},
 {'Emotion': 'neutral', 'Score': '0.0%'},
 {'Emotion': 'sadness', 'Score': '0.0%'}]

📚 详细文档

模型信息

属性	详情
模型类型	XLS - R Wav2Vec2 用于俄语语音情感分类
训练数据	Aniemore/resd

评估结果

情感类别	精确率	召回率	F1 - 分数	样本数
愤怒	0.97	0.86	0.92	44
厌恶	0.71	0.78	0.74	37
热情	0.51	0.80	0.62	40
恐惧	0.80	0.62	0.70	45
快乐	0.66	0.70	0.68	44
中立	0.81	0.66	0.72	38
悲伤	0.79	0.59	0.68	32
准确率			0.72	280
宏平均	0.75	0.72	0.72	280
加权平均	0.75	0.72	0.73	280

📄 许可证

本项目采用 MIT 许可证。

📖 引用

@misc{Aniemore,
  author = {Артем Аментес, Илья Лубенец, Никита Давидчук},
  title = {Открытая библиотека искусственного интеллекта для анализа и выявления эмоциональных оттенков речи человека},
  year = {2022},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.com/aniemore/Aniemore}},
  email = {hello@socialcode.ru}
}