Wav2vec2 Large 960h Lv60 Self With Wikipedia Lm
W
Wav2vec2 Large 960h Lv60 Self With Wikipedia Lm
Developed by gxbag
An automatic speech recognition (ASR) system based on Facebook's wav2vec2-large-960h-lv60-self model, improved with an enhanced Wikipedia language model
Downloads 15
Release Time : 4/20/2022
Model Overview
This model combines Facebook's wav2vec2 speech recognition architecture with a 5-gram language model trained on Wikipedia text, enhancing the accuracy of speech-to-text conversion.
Model Features
Enhanced language model
Uses a 5-gram KenLM language model trained on the full Wikipedia text, improving recognition accuracy
Large-scale training
Trained on 960 hours of speech data and over 8 million words of text data
Optimized processing
Wikipedia data was cleaned by removing non-main content such as references and external links
Efficient pruning
All 3-grams and larger singleton words in the language model were pruned to maintain model efficiency
Model Capabilities
English speech recognition
Long audio processing (supports chunk processing)
High-accuracy transcription
Use Cases
Speech transcription
Meeting minutes
Automatically convert meeting recordings into text transcripts
Improves meeting documentation efficiency and facilitates later retrieval
Podcast transcription
Convert podcast content into text versions
Facilitates content indexing and SEO optimization
Assistive technology
Real-time caption generation
Generate real-time captions for videos or live streams
Enhances content accessibility
Featured Recommended AI Models
Š 2025AIbase