Wav2Vec2 Large XLSR-53 German CV9 Open-Source Model - Free Realization of German Automatic Speech Recognition

Wav2vec2 Large Xlsr 53 German Cv9

Developed by oliverguhr

This is an automatic speech recognition (ASR) model fine-tuned on the German Common Voice 9.0 dataset, based on Facebook's wav2vec2-large-xlsr-53 model.

Speech Recognition

Transformers

GermanOpen Source License:Apache-2.0 #German Speech Recognition #Low WER #XLSR Pretraining

Downloads 98

Release Time : 6/13/2022

Model Overview

This model is specifically designed for German speech recognition tasks, achieving excellent performance with WER 9.48 and CER 1.92 on the Common Voice 9.0 German test set.

Model Features

High-performance German Speech Recognition

Achieves WER of only 9.48 and CER of 1.92 on the Common Voice 9.0 German test set.

Based on Large-scale Pretrained Model

Fine-tuned from Facebook's wav2vec2-large-xlsr-53 model, inheriting its powerful speech representation capabilities.

Supports Language Model Fusion

When combined with a language model (LM), WER can be further reduced to 7.49.

Model Capabilities

German Speech Recognition

Speech-to-Text

Automatic Speech Transcription

Use Cases

Speech Transcription

German Speech Transcription

Convert German speech content into text

WER 9.48, CER 1.92

Voice Assistants

German Voice Command Recognition

Used for voice command recognition in German voice assistants

🚀 wav2vec2-large-xlsr-53-german-cv9

This model is a fine - tuned speech recognition model that enhances performance on German datasets, offering high - accuracy speech - to - text conversion.

🚀 Quick Start

This model is a fine - tuned version of ./facebook/wav2vec2-large-xlsr-53 on the MOZILLA - FOUNDATION/COMMON_VOICE_9_0 - DE dataset.

It achieves the following results on the test set:

CER: 2.273015898213336
Wer: 9.480663281840769

✨ Features

Fine - tuned for German: Specifically optimized for the German language using the MOZILLA - FOUNDATION/COMMON_VOICE_9_0 - DE dataset.
High - performance metrics: Demonstrates good performance in terms of CER and WER on test sets.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Eval Wer
0.4129	1.0	3557	0.3015	0.2499
0.2121	2.0	7114	0.1596	0.1567
0.1455	3.0	10671	0.1377	0.1354
0.1436	4.0	14228	0.1301	0.1282
0.1144	5.0	17785	0.1225	0.1245
0.1219	6.0	21342	0.1254	0.1208
0.104	7.0	24899	0.1198	0.1232
0.1016	8.0	28456	0.1149	0.1174
0.1093	9.0	32013	0.1186	0.1186
0.0858	10.0	35570	0.1182	0.1164
0.102	11.0	39127	0.1191	0.1186
0.0834	12.0	42684	0.1161	0.1096
0.0916	13.0	46241	0.1147	0.1107
0.0811	14.0	49798	0.1174	0.1136
0.0814	15.0	53355	0.1132	0.1114
0.0865	16.0	56912	0.1134	0.1097
0.0701	17.0	60469	0.1096	0.1054
0.0891	18.0	64026	0.1110	0.1076
0.071	19.0	67583	0.1141	0.1074
0.0726	20.0	71140	0.1094	0.1093
0.0647	21.0	74697	0.1088	0.1095
0.0643	22.0	78254	0.1105	0.1044
0.0764	23.0	81811	0.1072	0.1042
0.0605	24.0	85368	0.1095	0.1026
0.0722	25.0	88925	0.1144	0.1066
0.0597	26.0	92482	0.1087	0.1022
0.062	27.0	96039	0.1073	0.1027
0.0536	28.0	99596	0.1068	0.1027
0.0616	29.0	103153	0.1097	0.1037
0.0642	30.0	106710	0.1117	0.1020
0.0555	31.0	110267	0.1109	0.0990
0.0632	32.0	113824	0.1104	0.0977
0.0482	33.0	117381	0.1108	0.0958
0.0601	34.0	120938	0.1095	0.0957
0.0508	35.0	124495	0.1079	0.0973
0.0526	36.0	128052	0.1068	0.0967
0.0487	37.0	131609	0.1081	0.0966
0.0495	38.0	135166	0.1099	0.0956
0.0528	39.0	138723	0.1091	0.0923
0.0439	40.0	142280	0.1111	0.0928
0.0467	41.0	145837	0.1131	0.0943
0.0407	42.0	149394	0.1115	0.0944
0.046	43.0	152951	0.1106	0.0935
0.0447	44.0	156508	0.1083	0.0919
0.0434	45.0	160065	0.1093	0.0909
0.0472	46.0	163622	0.1092	0.0921
0.0414	47.0	167179	0.1106	0.0922
0.0501	48.0	170736	0.1094	0.0918
0.0388	49.0	174293	0.1099	0.0918
0.0428	50.0	177850	0.1103	0.0915

Framework versions

Transformers 4.19.0.dev0
Pytorch 1.11.0+cu113
Datasets 2.0.0
Tokenizers 0.11.6

📄 License

This model is licensed under the Apache - 2.0 license.

📦 Additional Information

Property	Details
Language	German
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_9_0, generated_from_trainer
Datasets	mozilla - foundation/common_voice_9_0
Model Index	Name: wav2vec2 - large - xlsr - 53 - german - cv9, with multiple ASR task results on different datasets and metrics

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご