W

Wavlm Large

Developed by microsoft
WavLM is a large-scale self-supervised speech pre-training model developed by Microsoft, supporting full-stack speech processing tasks and excelling in the SUPERB benchmark.
Downloads 396.53k
Release Time : 3/2/2022

Model Overview

A pre-trained model built on 16kHz sampled speech audio data, achieving speech content modeling and speaker feature preservation through innovative architecture design, suitable for various speech processing tasks.

Model Features

Full-stack Speech Processing
Supports multiple speech tasks through a unified architecture, including speech recognition and speaker recognition
Large-scale Pre-training
Trained with 94,000 hours of English speech data, covering Libri-Light, GigaSpeech, and VoxPopuli datasets
Innovative Training Strategy
Employs unsupervised speech mixture training strategy to enhance speaker discrimination
High Performance
Achieves state-of-the-art performance on the SUPERB benchmark

Model Capabilities

Speech Feature Extraction
Speaker Recognition
Speech Content Understanding
Audio Classification

Use Cases

Speech Recognition
English Speech-to-Text
Convert English speech into text content
Requires fine-tuning before use
Speaker Recognition
Speaker Verification
Identify speaker identities in speech
Audio Analysis
Audio Classification
Classify and identify audio content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase