Wav2vec2 Large Xlsr 53 German Gpt2
W
Wav2vec2 Large Xlsr 53 German Gpt2
Developed by jsnfly
This is an automatic speech recognition encoder-decoder model trained on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 German dataset, combining the strengths of Wav2Vec2 and GPT2 architectures.
Downloads 28
Release Time : 3/2/2022
Model Overview
This model is designed for German automatic speech recognition tasks, leveraging the encoder capabilities of Wav2Vec2 and the decoder capabilities of GPT2 to achieve efficient speech-to-text conversion.
Model Features
Two-Stage Training
First fine-tune cross-attention weights and the decoder, then perform end-to-end fine-tuning to balance training efficiency and model performance.
Position Embedding Optimization
Add position embeddings to the encoder output and initialize them with GPT2 pre-trained position embeddings, significantly improving performance.
Resource Efficient
The first stage of training is suitable for small GPUs (e.g., 8GB VRAM), making it accessible for resource-constrained scenarios.
Model Capabilities
German Speech Recognition
High-Accuracy Speech-to-Text
Use Cases
Speech Transcription
German Speech-to-Text
Convert German speech content into text
Achieved a word error rate (WER) of 10.02% on the Common Voice 7 German test set.
Voice Assistants
German Voice Command Recognition
Recognize and understand German voice commands
Featured Recommended AI Models
Š 2025AIbase