w2v-speech-emotion-recognition open-source speech model - Free deployment to recognize six emotional states in English

Home

W2v Speech Emotion Recognition

Developed by Khoa

A Wav2Vec2-fine-tuned English speech emotion recognition model capable of identifying six emotional states

Audio Classification

Safetensors

EnglishOpen Source License:MIT #English speech emotion analysis #Multi-emotion classification #Wav2Vec2 fine-tuning

Downloads 147

Release Time : 8/27/2024

Model Overview

This model is specifically designed to recognize emotional states in English speech, including sadness, anger, disgust, fear, happiness, and neutrality. It is fine-tuned on the Kaggle speech emotion recognition dataset based on the Wav2Vec2 architecture.

Model Features

Multi-emotion recognition

Capable of identifying six different emotional states: sadness, anger, disgust, fear, happiness, and neutrality

High accuracy

Achieves an accuracy of 0.7435 on the test set, with particularly excellent performance in recognizing anger and neutral emotions

Based on Wav2Vec2 architecture

Leverages the powerful feature extraction capabilities of Wav2Vec2, making it suitable for speech emotion recognition tasks

Model Capabilities

English speech emotion recognition

Six-emotion classification

Audio feature extraction

Use Cases

Emotion analysis

Customer service call analysis

Analyze customer emotions in service calls

Helps identify dissatisfied customers and improve service quality

Mental health monitoring

Monitor user emotional states through speech analysis

Assists in mental health assessment and early intervention

Human-computer interaction

Smart assistant emotional response

Enables smart assistants to adjust responses based on user speech emotions

Enhances the naturalness and emotional resonance of human-computer interaction

🚀 Wav2Vec2 Speech Emotion Recognition for English

This model leverages the Wav2Vec2 architecture to recognize emotions in English speech. It can detect a range of emotions, offering valuable insights into the emotional tone of spoken English.

🚀 Quick Start

To use this model, you first need to install the necessary packages. Then, you can load the model and perform emotion classification on an English audio file.

Installation

pip install transformers
pip install torchaudio

Example Usage

from transformers import pipeline

# Load the fine-tuned model and feature extractor
pipe = pipeline("audio-classification", model="Khoa/w2v-speech-emotion-recognition")

# Path to your audio file
audio_file = "path_to_your_audio_file.wav"

# Perform emotion classification
predictions = pipe(audio_file)

# Map predictions to real emotion labels
label_map = {
    "LABEL_0": "sadness",
    "LABEL_1": "angry",
    "LABEL_2": "disgust",
    "LABEL_3": "fear",
    "LABEL_4": "happy",
    "LABEL_5": "neutral"
}

# Convert predictions to readable labels
mapped_predictions = [
    {"score": pred["score"], "label": label_map[pred["label"]]} 
    for pred in predictions
]

# Display results
print(mapped_predictions)

✨ Features

Emotion Detection: Capable of detecting multiple emotions including sadness, anger, disgust, fear, happiness, and neutral in English speech.
Fine - tuned Model: Fine - tuned on a high - quality dataset for better performance in emotion recognition tasks.

📦 Model Details

Property	Details
Model Type	Wav2Vec2
Languages	English
Training Data	Speech Emotion Recognition Dataset (Kaggle)
Emotions Detected	Sadness, Anger, Disgust, Fear, Happiness, Neutral

🔧 Technical Details

The model was fine - tuned on the Speech Emotion Recognition Dataset, using the Wav2Vec2 architecture. The training process involved multiple epochs with a learning rate of 1e - 5.

📚 Documentation

Training Results

The model achieved the following results on the test set:

Test Accuracy: 0.7435

Classification Report:

              precision    recall  f1-score   support

     sadness       0.68      0.71      0.70       251
       angry       0.75      0.93      0.83       258
     disgust       0.86      0.64      0.73       250
        fear       0.75      0.61      0.67       287
       happy       0.73      0.68      0.71       231
     neutral       0.72      0.92      0.81       212

    accuracy                           0.74      1489
   macro avg       0.75      0.75      0.74      1489
weighted avg       0.75      0.74      0.74      1489

Example Output

[
    {"score": 0.95, "label": "angry"},
    {"score": 0.02, "label": "happy"},
    {"score": 0.01, "label": "disgust"},
    {"score": 0.01, "label": "neutral"},
    {"score": 0.01, "label": "fear"}
]

📄 License

This model is released under the MIT license.

⚠️ Important Note

This model is specifically trained on English speech data and may not perform well on other languages or dialects. Additionally, as with any machine learning model, there may be biases present in the training data that could affect the model's predictions.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご