SER Odyssey Baseline WavLM Multi Attributes
A multi-attribute speech emotion recognition baseline model based on WavLM architecture, predicting arousal, dominance, and valence dimensions
Downloads 23.09k
Release Time : 3/5/2024
Model Overview
This model is a speech emotion recognition model trained on the MSP-Podcast dataset, specifically developed as a baseline for the Odyssey 2024 Emotion Recognition Competition. It simultaneously predicts three emotional dimensions in speech: arousal, dominance, and valence, with output values ranging from 0 to 1.
Model Features
Multi-Attribute Emotion Prediction
Simultaneously predicts three emotional dimensions—arousal, dominance, and valence—providing comprehensive emotional analysis
Trained on MSP-Podcast Dataset
Uses a professional emotional speech dataset for training, ensuring high reliability
Standardized Audio Processing
Built-in mean/standard deviation normalization ensures consistent input audio quality
Model Capabilities
Speech Emotion Recognition
Arousal Prediction
Dominance Prediction
Valence Prediction
Audio Classification
Use Cases
Affective Computing
Speech Emotion Analysis
Analyzes emotional states in speech for psychological research or user experience evaluation
Accurately identifies three emotional dimensions: arousal, dominance, and valence
Human-Computer Interaction
Intelligent Customer Service Emotion Recognition
Real-time identification of emotional states in user speech to optimize customer service response strategies
Featured Recommended AI Models