The open-source model wav2vec2-large-xlsr-53-english-finetuned-ravdess

Wav2vec2 Large Xlsr 53 English Finetuned Ravdess

Developed by firdho26

A speech emotion recognition model fine-tuned on the RAVDESS dataset based on the wav2vec2-large-xlsr-53-english model

Open Source License:Apache-2.0 #Speech Emotion Recognition #High-precision Audio Classification #English Speech Processing

Downloads 68

Release Time : 1/30/2024

Model Overview

This model is a deep learning model optimized for English speech emotion recognition tasks, capable of identifying emotional categories in speech.

Model Features

High Accuracy Emotion Recognition

Achieves 82.99% accuracy on the RAVDESS dataset

Fine-tuned Based on Pre-trained Model

Utilizes transfer learning with the wav2vec2-large-xlsr-53-english pre-trained model

Multi-metric Evaluation

Provides multi-dimensional performance evaluation including accuracy, precision, recall, and F1 score

Model Capabilities

Speech Emotion Classification

English Speech Analysis

Audio Feature Extraction

Use Cases

Affective Computing

Speech Emotion Analysis

Analyze emotional states in speech recordings

Can identify multiple emotional categories

Human-Computer Interaction

Intelligent Customer Service Emotion Recognition

Identify emotional states in customer speech

Helps customer service systems provide more human-like responses

🚀 wav2vec2-large-xlsr-53-english-finetuned-ravdess

This model is a fine - tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english on the RAVDESS dataset, achieving high accuracy in audio classification.

📚 Documentation

This model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english on the RAVDESS dataset. It achieves the following results on the evaluation set:

Loss: 0.5624
Accuracy: 0.8299
Precision: 0.8453
Recall: 0.8299
F1: 0.8330

Model Information

Property	Details
Base Model	jonatasgrosman/wav2vec2-large-xlsr-53-english
License	Apache-2.0
Tags	generated_from_trainer
Datasets	narad/ravdess
Metrics	accuracy, precision, recall, f1

Model Index

Name: wav2vec2-large-xlsr-53-english-finetuned-ravdess
- Results:
  - Task:
    - Name: Audio Classification
    - Type: audio-classification
  - Dataset:
    - Name: RAVDESS
    - Type: narad/ravdess
    - Config: all
    - Split: train
    - Args: all
  - Metrics:
    - Name: Accuracy, Type: accuracy, Value: 0.8298611111111112
    - Name: Precision, Type: precision, Value: 0.8453025128787324
    - Name: Recall, Type: recall, Value: 0.8298611111111112
    - Name: F1, Type: f1, Value: 0.8329568451751053

🔧 Technical Details

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 6
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
1.9765	1.0	288	1.9102	0.3090	0.3203	0.3090	0.1941
1.4803	2.0	576	1.4590	0.5660	0.5493	0.5660	0.4811
1.1625	3.0	864	1.2308	0.6215	0.6299	0.6215	0.5936
0.8354	4.0	1152	0.7821	0.7222	0.7555	0.7222	0.6869
0.2066	5.0	1440	0.7910	0.7708	0.8373	0.7708	0.7881
0.6335	6.0	1728	0.5624	0.8299	0.8453	0.8299	0.8330

Framework Versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご