W

Wav2vec2 Base 960h

Developed by tommy19970714
Wav2Vec2 is a self-supervised learning-based speech recognition model developed by Facebook, trained on the LibriSpeech dataset, supporting English speech-to-text tasks.
Downloads 19
Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) system capable of converting English speech into text. Based on the Transformer architecture, it was trained on 960 hours of LibriSpeech data.

Model Features

Self-supervised Learning
Uses self-supervised learning for pre-training, reducing reliance on manually annotated data
High Accuracy
Achieves a word error rate (WER) of 3.4% (clean) and 8.6% (other) on the LibriSpeech test set
End-to-end Training
Learns directly from raw audio without requiring separate components found in traditional speech recognition systems

Model Capabilities

English speech recognition
Audio-to-text conversion
Speech transcription

Use Cases

Speech Transcription
Meeting Minutes
Automatically transcribes meeting recordings
Accuracy depends on audio quality, reaching up to 96.6% on clear speech
Podcast Transcription
Converts podcast content into text
Assistive Technology
Real-time Caption Generation
Generates real-time captions for videos or live streams
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase