wav2vec2-base-Speech_Emotion_Recognition Open-source Model - Accurately Predict the Emotions of Audio Speakers

Wav2vec2 Base Speech Emotion Recognition

Developed by DunnBC22

A speech emotion recognition model fine-tuned based on facebook/wav2vec2-base, used to predict the speaker's emotions in audio samples.

Audio Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Speech Emotion Analysis #English Speech Processing #wav2vec2 Fine-tuning

Downloads 128

Release Time : 4/17/2023

Model Overview

This model identifies the speaker's emotional state by analyzing speech signals, suitable for scenarios like emotion analysis and human-computer interaction.

Model Features

High Accuracy

Achieves 75.39% accuracy on the evaluation set, effectively recognizing multiple emotional states.

Multi-metric Optimization

Simultaneously optimizes metrics such as F1 score, recall, and precision to ensure balanced model performance.

Based on wav2vec2

Fine-tuned from facebook/wav2vec2-base, inheriting its powerful speech feature extraction capabilities.

Model Capabilities

Speech Emotion Recognition

Audio Classification

Emotion Analysis

Use Cases

Human-Computer Interaction

Intelligent Customer Service Emotion Analysis

Used to analyze emotional states in customer speech to improve service quality.

Mental Health

Emotional State Monitoring

Analyzes users' emotional changes through speech for mental health auxiliary diagnosis.

🚀 wav2vec2-base-Speech_Emotion_Recognition

This model is a fine - tuned version of facebook/wav2vec2-base, which can predict the emotion of the speaker in the audio sample.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base.

It achieves the following results on the evaluation set:

Loss: 0.7264
Accuracy: 0.7539
F1
- Weighted: 0.7514
- Micro: 0.7539
- Macro: 0.7529
Recall
- Weighted: 0.7539
- Micro: 0.7539
- Macro: 0.7577
Precision
- Weighted: 0.7565
- Micro: 0.7539
- Macro: 0.7558

✨ Features

Model description

This model predicts the emotion of the person speaking in the audio sample.

For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection

Intended uses & limitations

This model is intended to demonstrate my ability to solve a complex problem using technology.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Training and evaluation data

Dataset Source: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Weighted F1	Micro F1	Macro F1	Weighted Recall	Micro Recall	Macro Recall	Weighted Precision	Micro Precision	Macro Precision
1.5581	0.98	43	1.4046	0.4653	0.4080	0.4653	0.4174	0.4653	0.4653	0.4793	0.5008	0.4653	0.4974
1.5581	1.98	86	1.1566	0.5997	0.5836	0.5997	0.5871	0.5997	0.5997	0.6093	0.6248	0.5997	0.6209
1.5581	2.98	129	0.9733	0.6883	0.6845	0.6883	0.6860	0.6883	0.6883	0.6923	0.7012	0.6883	0.7009
1.5581	3.98	172	0.8313	0.7399	0.7392	0.7399	0.7409	0.7399	0.7399	0.7417	0.7415	0.7399	0.7432
1.5581	4.98	215	0.8708	0.7028	0.6963	0.7028	0.6970	0.7028	0.7028	0.7081	0.7148	0.7028	0.7114
1.5581	5.98	258	0.7969	0.7297	0.7267	0.7297	0.7277	0.7297	0.7297	0.7333	0.7393	0.7297	0.7382
1.5581	6.98	301	0.7349	0.7603	0.7613	0.7603	0.7631	0.7603	0.7603	0.7635	0.7699	0.7603	0.7702
1.5581	7.98	344	0.7714	0.7469	0.7444	0.7469	0.7456	0.7469	0.7469	0.7485	0.7554	0.7469	0.7563
1.5581	8.98	387	0.7183	0.7630	0.7615	0.7630	0.7631	0.7630	0.7630	0.7652	0.7626	0.7630	0.7637
1.5581	9.98	430	0.7264	0.7539	0.7514	0.7539	0.7529	0.7539	0.7539	0.7577	0.7565	0.7539	0.7558

Framework versions

Transformers 4.26.1
Pytorch 2.0.0+cu118
Datasets 2.11.0
Tokenizers 0.13.3

🔧 Technical Details

No specific technical implementation details (more than 50 words) are provided in the original document, so this section is skipped.

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご