S

SER Odyssey Baseline WavLM Multi Attributes

Developed by 3loi
A multi-attribute speech emotion recognition baseline model based on WavLM architecture, predicting arousal, dominance, and valence dimensions
Downloads 23.09k
Release Time : 3/5/2024

Model Overview

This model is a speech emotion recognition model trained on the MSP-Podcast dataset, specifically developed as a baseline for the Odyssey 2024 Emotion Recognition Competition. It simultaneously predicts three emotional dimensions in speech: arousal, dominance, and valence, with output values ranging from 0 to 1.

Model Features

Multi-Attribute Emotion Prediction
Simultaneously predicts three emotional dimensions—arousal, dominance, and valence—providing comprehensive emotional analysis
Trained on MSP-Podcast Dataset
Uses a professional emotional speech dataset for training, ensuring high reliability
Standardized Audio Processing
Built-in mean/standard deviation normalization ensures consistent input audio quality

Model Capabilities

Speech Emotion Recognition
Arousal Prediction
Dominance Prediction
Valence Prediction
Audio Classification

Use Cases

Affective Computing
Speech Emotion Analysis
Analyzes emotional states in speech for psychological research or user experience evaluation
Accurately identifies three emotional dimensions: arousal, dominance, and valence
Human-Computer Interaction
Intelligent Customer Service Emotion Recognition
Real-time identification of emotional states in user speech to optimize customer service response strategies
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase