W

Wav2vec2 Base 960h

Developed by facebook
The Wav2Vec2 base model developed by Facebook, pre-trained and fine-tuned on 960 hours of LibriSpeech audio for English automatic speech recognition tasks.
Downloads 2.1M
Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model capable of converting English speech into text. It is pre-trained and fine-tuned on the LibriSpeech dataset and supports audio input with a 16kHz sampling rate.

Model Features

Efficient speech recognition
Achieves a 3.4% word error rate (WER) on the LibriSpeech clean test set, demonstrating excellent performance.
High performance with limited labeled data
Using only ten minutes of labeled data and 53k hours of unlabeled data for pre-training, it still achieves a WER of 4.8/8.2.
16kHz sampling rate support
The model is optimized for audio with a 16kHz sampling rate. Ensure input audio meets this specification when using the model.

Model Capabilities

English speech recognition
Audio-to-text conversion
Automatic speech transcription

Use Cases

Speech transcription
Meeting minutes
Automatically convert meeting recordings into text transcripts
Highly accurate transcription results
Podcast transcription
Convert English podcast content into searchable text
Facilitates content retrieval and analysis
Assistive technology
Voice input system
Provides speech-to-text functionality for people with disabilities
Improves accessibility
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase