W

Wav2vec2 Large Xlsr 53 English

Developed by jonatasgrosman
An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset
Downloads 251.78k
Release Time : 3/2/2022

Model Overview

This is a fine-tuned XLSR-53 large model for English speech recognition tasks, capable of converting English speech to text

Model Features

High-performance English speech recognition
Achieves 19.06% word error rate and 7.69% character error rate on the Common Voice test set
Language model enhancement support
With a language model, the word error rate can be reduced to 14.81% and character error rate to 6.84%
16kHz sampling rate support
Optimized for 16kHz sampled speech input
Based on XLSR-53 pre-trained model
Leverages the advantages of large-scale cross-lingual speech representation (XLSR) pre-training

Model Capabilities

English speech recognition
Speech-to-text conversion
Supports long audio processing (via chunking)

Use Cases

Speech transcription
Automatic meeting transcription
Automatically converts English meeting recordings into text transcripts
Approximately 80.94% accuracy (based on WER)
Voice note conversion
Converts personal voice memos into searchable text
Assistive technology
Real-time caption generation
Generates real-time captions for English videos or live streams
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase