W

Wavlm Base Plus

Developed by microsoft
WavLM is a large-scale self-supervised pretrained speech model developed by Microsoft, pretrained on 16kHz sampled speech audio, suitable for various speech processing tasks.
Downloads 673.32k
Release Time : 3/2/2022

Model Overview

WavLM is a pretrained speech model built on the HuBERT framework, focusing on speech content modeling and speaker identity preservation. The model excels in the SUPERB benchmark and is suitable for various downstream tasks such as speech recognition and speech classification.

Model Features

Large-scale Pretraining
The model was pretrained on 60k hours of Libri-Light, 10k hours of GigaSpeech, and 24k hours of VoxPopuli datasets.
Full-stack Speech Processing
Optimized for speech content modeling and speaker identity preservation, suitable for various speech processing tasks.
Mixed Speech Training
Adopts an unsupervised training strategy that generates overlapping speech to enhance speaker differentiation.

Model Capabilities

Speech recognition
Speech classification
Speaker verification
Speaker diarization

Use Cases

Speech recognition
English Speech-to-Text
Convert English speech into text content.
Achieves state-of-the-art performance on the SUPERB benchmark
Speech classification
Emotion Analysis
Analyze the speaker's emotional state through speech.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase