Hubert-large-speech-emotion-recognition-russian-dusha-finetuned Open Source Model - Identify Multiple Emotional States in Russian Speech

Hubert Large Speech Emotion Recognition Russian Dusha Finetuned

Developed by xbgoose

This model is a Russian speech emotion recognition model fine-tuned on the HuBERT architecture, trained on the DUSHA dataset, capable of identifying emotional states such as neutral, anger, positivity, and sadness.

Audio Classification

Transformers

OtherOpen Source License:Apache-2.0 #Russian Speech Emotion Recognition #High-Accuracy Emotion Classification #HuBERT Fine-Tuning

Downloads 111.13k

Release Time : 5/28/2023

Model Overview

This is a deep learning model specifically designed for Russian speech emotion recognition, fine-tuned from the facebook/hubert-large-ls960-ft pre-trained model, suitable for speech emotion analysis applications.

Model Features

High-Accuracy Emotion Recognition

Achieves 86% accuracy and 81% F1 score on the test set, outperforming the baseline.

Optimized for Russian

Specially fine-tuned using the Russian DUSHA dataset, ideal for Russian speech emotion analysis.

Efficient Fine-Tuning Strategy

Employs partial layer freezing and semi-dataset training to enhance training efficiency while maintaining performance.

Model Capabilities

Russian Speech Emotion Classification

Audio Feature Extraction

Emotion State Recognition

Use Cases

Emotion Analysis

Customer Service Voice Emotion Monitoring

Analyzes customer emotional changes during service calls.

Can identify negative emotions like anger for timely alerts.

Mental Health Assessment

Evaluates emotional states of depression patients through voice analysis.

Can detect trends in sadness-related emotional changes.

Human-Computer Interaction

Intelligent Voice Assistant

Adjusts response strategies based on user voice emotions.

Provides a more human-like interaction experience.

🚀 HuBERT fine-tuned on DUSHA dataset for speech emotion recognition in Russian language

This project fine-tunes the HuBERT model on the DUSHA dataset to achieve speech emotion recognition in Russian, improving accuracy and F1 score compared to the baseline.

🚀 Quick Start

The pre-trained model is facebook/hubert-large-ls960-ft. The DUSHA dataset used can be found here.

✨ Features

Fine-tuning Environment: Fine-tuned in Google Colab using a Pro account with an A100 GPU.
Layer Freezing: All layers were frozen except the projector, classifier, and all 24 HubertEncoderLayerStableLayerNorm layers.
Dataset Usage: Half of the training dataset was used.

📦 Installation

There is no specific installation content provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
import torchaudio
import torch

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")
model = HubertForSequenceClassification.from_pretrained("xbgoose/hubert-speech-emotion-recognition-russian-dusha-finetuned")
num2emotion = {0: 'neutral', 1: 'angry', 2: 'positive', 3: 'sad', 4: 'other'}

filepath = "path/to/audio.wav"

waveform, sample_rate = torchaudio.load(filepath, normalize=True)
transform = torchaudio.transforms.Resample(sample_rate, 16000)
waveform = transform(waveform)

inputs = feature_extractor(
        waveform, 
        sampling_rate=feature_extractor.sampling_rate, 
        return_tensors="pt",
        padding=True,
        max_length=16000 * 10,
        truncation=True
    )

logits = model(inputs['input_values'][0]).logits
predictions = torch.argmax(logits, dim=-1)
predicted_emotion = num2emotion[predictions.numpy()[0]]
print(predicted_emotion)

📚 Documentation

Training Parameters

2 epochs
Train batch size = 8
Eval batch size = 8
Gradient accumulation steps = 4
Learning rate = 5e-5 without warm up and decay

Metrics

Achieved

Accuracy = 0.86
Balanced = 0.76
Macro F1 score = 0.81 on the test set, improving accuracy and F1 score compared to the dataset baseline.

🔧 Technical Details

There is no specific technical details content that meets the requirements in the original document, so this section is skipped.

📄 License

The project is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご