W

Wav2vec2 Large 960h Lv60 Self

Developed by facebook
The Wav2Vec2 large model developed by Facebook, pre-trained and fine-tuned on 960 hours of Libri-Light and Librispeech audio data, using self-training objectives, achieving SOTA results on the LibriSpeech test set.
Downloads 56.00k
Release Time : 3/2/2022

Model Overview

A pre-trained model for automatic speech recognition (ASR) that learns speech representations from raw audio through self-supervised learning, then achieves high-precision speech-to-text conversion via fine-tuning.

Model Features

Self-supervised Pre-training
Learns speech representations in latent space through contrastive learning objectives, reducing reliance on labeled data
High-precision Recognition
Achieves SOTA results of 1.9/3.9 WER (clean/other) on the LibriSpeech test set
Low-resource Adaptation
Requires only a small amount of labeled data for fine-tuning, outperforming traditional methods with just 1 hour of labeled data

Model Capabilities

English speech recognition
16kHz audio processing
End-to-end speech-to-text

Use Cases

Speech Transcription
Automated Meeting Minutes
Automatically converts English meeting recordings into text transcripts
High-accuracy transcription, reducing manual documentation costs
Podcast Subtitle Generation
Automatically generates subtitles for English podcast content
Supports batch processing with accuracy rates exceeding 96%
Assistive Technology
Hearing Impairment Assistance
Real-time speech-to-text conversion for hearing-impaired individuals
Low-latency real-time conversion
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase