W

Wav2vec2 Large 960h Lv60 Self With Wikipedia Lm

Developed by gxbag
An automatic speech recognition (ASR) system based on Facebook's wav2vec2-large-960h-lv60-self model, improved with an enhanced Wikipedia language model
Downloads 15
Release Time : 4/20/2022

Model Overview

This model combines Facebook's wav2vec2 speech recognition architecture with a 5-gram language model trained on Wikipedia text, enhancing the accuracy of speech-to-text conversion.

Model Features

Enhanced language model
Uses a 5-gram KenLM language model trained on the full Wikipedia text, improving recognition accuracy
Large-scale training
Trained on 960 hours of speech data and over 8 million words of text data
Optimized processing
Wikipedia data was cleaned by removing non-main content such as references and external links
Efficient pruning
All 3-grams and larger singleton words in the language model were pruned to maintain model efficiency

Model Capabilities

English speech recognition
Long audio processing (supports chunk processing)
High-accuracy transcription

Use Cases

Speech transcription
Meeting minutes
Automatically convert meeting recordings into text transcripts
Improves meeting documentation efficiency and facilitates later retrieval
Podcast transcription
Convert podcast content into text versions
Facilitates content indexing and SEO optimization
Assistive technology
Real-time caption generation
Generate real-time captions for videos or live streams
Enhances content accessibility
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase