Open-source wav2vec2-xlsr-53-russian-emotion-recognition model - accurately identify 7 emotions in Russian speech

Wav2vec2 Xlsr 53 Russian Emotion Recognition

Developed by Aniemore

This is a Russian speech emotion recognition model based on the XLS-R Wav2Vec2 architecture, capable of identifying 7 basic emotions with an accuracy of 72%.

Audio Classification

Transformers

OtherOpen Source License:MIT #Russian Speech Emotion Recognition #Multi-emotion Classification #Wav2Vec2 Architecture

Downloads 1,106

Release Time : 5/22/2022

Model Overview

This model is specifically designed for emotion recognition in Russian speech, capable of analyzing audio files and identifying emotions such as anger, disgust, excitement, fear, happiness, neutrality, and sadness.

Model Features

High-precision Emotion Recognition

Achieves 72% accuracy on Russian emotional speech datasets

Multi-emotion Classification

Capable of identifying 7 different emotional states

Based on Wav2Vec2 Architecture

Utilizes advanced speech representation learning technology

Model Capabilities

Russian Speech Emotion Recognition

Audio Emotion Classification

Speech Emotion Analysis

Use Cases

Human-Computer Interaction

Customer Service Emotion Analysis

Analyze customer emotions in service calls

Can identify customer dissatisfaction to improve service quality

Mental Health

Emotional State Monitoring

Analyze user emotional states through speech

Can be used for emotional monitoring in mental health applications

🚀 XLS-R Wav2Vec2 For Russian Speech Emotion Classification

This model is designed for Russian speech emotion classification, leveraging the XLS - R Wav2Vec2 architecture to accurately recognize emotions in audio.

✨ Features

Tags: audio - classification, audio, emotion, emotion - recognition, emotion - classification, speech
License: MIT
Datasets: Aniemore/resd

📦 Installation

The installation steps are not explicitly provided in the original document. However, you need to have the necessary libraries installed such as torch, torchaudio, transformers, librosa, etc. You can install them using pip:

pip install torch torchaudio transformers librosa numpy

💻 Usage Examples

Basic Usage

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, AutoModel, Wav2Vec2FeatureExtractor

import librosa
import numpy as np


def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model_(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in enumerate(scores)]
    return outputs

Advanced Usage

TRUST = True

config = AutoConfig.from_pretrained('Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition', trust_remote_code=TRUST)
model_ = AutoModel.from_pretrained("Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition", trust_remote_code=TRUST)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("Aniemore/wav2vec2-xlsr-53-russian-emotion-recognition")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_.to(device)

result = predict("/path/to/russian_audio_speech.wav", 16000)
print(result)

# outputs
[{'Emotion': 'anger', 'Score': '0.0%'},
 {'Emotion': 'disgust', 'Score': '100.0%'},
 {'Emotion': 'enthusiasm', 'Score': '0.0%'},
 {'Emotion': 'fear', 'Score': '0.0%'},
 {'Emotion': 'happiness', 'Score': '0.0%'},
 {'Emotion': 'neutral', 'Score': '0.0%'},
 {'Emotion': 'sadness', 'Score': '0.0%'}]

📚 Documentation

Model Index

Name: XLS - R Wav2Vec2 For Russian Speech Emotion Classification by Nikita Davidchuk
Results:
- Task:
  - Name: Audio Emotion Recognition
  - Type: audio - emotion - recognition
- Dataset:
  - Name: Russian Emotional Speech Dialogs
  - Type: Aniemore/resd
  - Args: ru
- Metrics:
  - Name: accuracy
  - Type: accuracy
  - Value: 72%

Performance Metrics

Property	anger	disgust	enthusiasm	fear	happiness	neutral	sadness	accuracy	macro avg	weighted avg
precision	0.97	0.71	0.51	0.80	0.66	0.81	0.79		0.75	0.75
recall	0.86	0.78	0.80	0.62	0.70	0.66	0.59		0.72	0.72
f1 - score	0.92	0.74	0.62	0.70	0.68	0.72	0.68	0.72	0.72	0.73
support	44	37	40	45	44	38	32	280	280	280

📄 License

This project is licensed under the MIT license.

📖 Citations

@misc{Aniemore,
  author = {Артем Аментес, Илья Лубенец, Никита Давидчук},
  title = {Открытая библиотека искусственного интеллекта для анализа и выявления эмоциональных оттенков речи человека},
  year = {2022},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.com/aniemore/Aniemore}},
  email = {hello@socialcode.ru}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご