W

Wav2vec2 Large Robust 12 Ft Emotion Msp Dim

Developed by audeering
This model is fine-tuned from Wav2Vec2-Large-Robust for speech emotion recognition, predicting values in three dimensions: arousal, dominance, and valence.
Downloads 394.51k
Release Time : 4/6/2022

Model Overview

The model takes raw audio signals as input and outputs predictions for three dimensions (approximately in the range of 0...1): arousal, dominance, and valence, while also providing the pooled state of the final transformer layer.

Model Features

Dimensional Emotion Recognition
Predicts continuous dimensional values for arousal, dominance, and valence, rather than discrete emotion categories.
Fine-tuned Pre-trained Model
Fine-tuned from Wav2Vec2-Large-Robust, leveraging the advantages of large-scale self-supervised pre-training.
Feature Extraction Capability
Can output the pooled state of the final transformer layer as a speech feature vector.
Model Optimization
The original 24-layer Transformer was pruned to 12 layers, balancing performance and efficiency.

Model Capabilities

Speech Emotion Analysis
Speech Feature Extraction
Continuous Dimensional Emotion Prediction

Use Cases

Human-Computer Interaction
Intelligent Customer Service Emotion Analysis
Analyze emotional states in user speech to optimize customer service response strategies.
Quantifiable changes in user emotions.
Mental Health
Emotional State Monitoring
Monitor emotional fluctuations in patients with psychological conditions such as depression through speech analysis.
Provides objective dimensional emotion indicators.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase