Open-source model of wav2vec2-large-xlsr-53-german-gpt2 - Accurately achieve German automatic speech recognition

Wav2vec2 Large Xlsr 53 German Gpt2

Developed by jsnfly

This is an automatic speech recognition encoder-decoder model trained on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 German dataset, combining the strengths of Wav2Vec2 and GPT2 architectures.

Speech Recognition

Transformers

GermanOpen Source License:Apache-2.0 #German Speech Recognition #Low Word Error Rate (10.02 WER)#Encoder-Decoder Architecture

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is designed for German automatic speech recognition tasks, leveraging the encoder capabilities of Wav2Vec2 and the decoder capabilities of GPT2 to achieve efficient speech-to-text conversion.

Model Features

Two-Stage Training

First fine-tune cross-attention weights and the decoder, then perform end-to-end fine-tuning to balance training efficiency and model performance.

Position Embedding Optimization

Add position embeddings to the encoder output and initialize them with GPT2 pre-trained position embeddings, significantly improving performance.

Resource Efficient

The first stage of training is suitable for small GPUs (e.g., 8GB VRAM), making it accessible for resource-constrained scenarios.

Model Capabilities

German Speech Recognition

High-Accuracy Speech-to-Text

Use Cases

Speech Transcription

German Speech-to-Text

Convert German speech content into text

Achieved a word error rate (WER) of 10.02% on the Common Voice 7 German test set.

Voice Assistants

German Voice Command Recognition

Recognize and understand German voice commands

Property	Details
Model Type	Encoder - Decoder for Automatic Speech Recognition
Training Data	MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - DE

Task	Dataset	Metrics	Value
Automatic Speech Recognition	Common Voice 7 (mozilla - foundation/common_voice_7_0, args: de)	Test WER	10.02
Automatic Speech Recognition	Common Voice 7 (mozilla - foundation/common_voice_7_0, args: de)	Test CER	4.7

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 Large Xlsr 53 German Gpt2

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Wav2Vec2-Large-XLSR-53-German-GPT2

🚀 Quick Start

✨ Features

📚 Documentation

Model Information

Results

Training Notes

📄 License