đ wav2vec2-large-xlsr-53-german-cv9
This model is a fine - tuned speech recognition model that enhances performance on German datasets, offering high - accuracy speech - to - text conversion.
đ Quick Start
This model is a fine - tuned version of ./facebook/wav2vec2-large-xlsr-53 on the MOZILLA - FOUNDATION/COMMON_VOICE_9_0 - DE dataset.
It achieves the following results on the test set:
- CER: 2.273015898213336
- Wer: 9.480663281840769
⨠Features
- Fine - tuned for German: Specifically optimized for the German language using the MOZILLA - FOUNDATION/COMMON_VOICE_9_0 - DE dataset.
- High - performance metrics: Demonstrates good performance in terms of CER and WER on test sets.
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Eval Wer |
0.4129 |
1.0 |
3557 |
0.3015 |
0.2499 |
0.2121 |
2.0 |
7114 |
0.1596 |
0.1567 |
0.1455 |
3.0 |
10671 |
0.1377 |
0.1354 |
0.1436 |
4.0 |
14228 |
0.1301 |
0.1282 |
0.1144 |
5.0 |
17785 |
0.1225 |
0.1245 |
0.1219 |
6.0 |
21342 |
0.1254 |
0.1208 |
0.104 |
7.0 |
24899 |
0.1198 |
0.1232 |
0.1016 |
8.0 |
28456 |
0.1149 |
0.1174 |
0.1093 |
9.0 |
32013 |
0.1186 |
0.1186 |
0.0858 |
10.0 |
35570 |
0.1182 |
0.1164 |
0.102 |
11.0 |
39127 |
0.1191 |
0.1186 |
0.0834 |
12.0 |
42684 |
0.1161 |
0.1096 |
0.0916 |
13.0 |
46241 |
0.1147 |
0.1107 |
0.0811 |
14.0 |
49798 |
0.1174 |
0.1136 |
0.0814 |
15.0 |
53355 |
0.1132 |
0.1114 |
0.0865 |
16.0 |
56912 |
0.1134 |
0.1097 |
0.0701 |
17.0 |
60469 |
0.1096 |
0.1054 |
0.0891 |
18.0 |
64026 |
0.1110 |
0.1076 |
0.071 |
19.0 |
67583 |
0.1141 |
0.1074 |
0.0726 |
20.0 |
71140 |
0.1094 |
0.1093 |
0.0647 |
21.0 |
74697 |
0.1088 |
0.1095 |
0.0643 |
22.0 |
78254 |
0.1105 |
0.1044 |
0.0764 |
23.0 |
81811 |
0.1072 |
0.1042 |
0.0605 |
24.0 |
85368 |
0.1095 |
0.1026 |
0.0722 |
25.0 |
88925 |
0.1144 |
0.1066 |
0.0597 |
26.0 |
92482 |
0.1087 |
0.1022 |
0.062 |
27.0 |
96039 |
0.1073 |
0.1027 |
0.0536 |
28.0 |
99596 |
0.1068 |
0.1027 |
0.0616 |
29.0 |
103153 |
0.1097 |
0.1037 |
0.0642 |
30.0 |
106710 |
0.1117 |
0.1020 |
0.0555 |
31.0 |
110267 |
0.1109 |
0.0990 |
0.0632 |
32.0 |
113824 |
0.1104 |
0.0977 |
0.0482 |
33.0 |
117381 |
0.1108 |
0.0958 |
0.0601 |
34.0 |
120938 |
0.1095 |
0.0957 |
0.0508 |
35.0 |
124495 |
0.1079 |
0.0973 |
0.0526 |
36.0 |
128052 |
0.1068 |
0.0967 |
0.0487 |
37.0 |
131609 |
0.1081 |
0.0966 |
0.0495 |
38.0 |
135166 |
0.1099 |
0.0956 |
0.0528 |
39.0 |
138723 |
0.1091 |
0.0923 |
0.0439 |
40.0 |
142280 |
0.1111 |
0.0928 |
0.0467 |
41.0 |
145837 |
0.1131 |
0.0943 |
0.0407 |
42.0 |
149394 |
0.1115 |
0.0944 |
0.046 |
43.0 |
152951 |
0.1106 |
0.0935 |
0.0447 |
44.0 |
156508 |
0.1083 |
0.0919 |
0.0434 |
45.0 |
160065 |
0.1093 |
0.0909 |
0.0472 |
46.0 |
163622 |
0.1092 |
0.0921 |
0.0414 |
47.0 |
167179 |
0.1106 |
0.0922 |
0.0501 |
48.0 |
170736 |
0.1094 |
0.0918 |
0.0388 |
49.0 |
174293 |
0.1099 |
0.0918 |
0.0428 |
50.0 |
177850 |
0.1103 |
0.0915 |
Framework versions
- Transformers 4.19.0.dev0
- Pytorch 1.11.0+cu113
- Datasets 2.0.0
- Tokenizers 0.11.6
đ License
This model is licensed under the Apache - 2.0 license.
đĻ Additional Information
Property |
Details |
Language |
German |
Tags |
automatic - speech - recognition, mozilla - foundation/common_voice_9_0, generated_from_trainer |
Datasets |
mozilla - foundation/common_voice_9_0 |
Model Index |
Name: wav2vec2 - large - xlsr - 53 - german - cv9, with multiple ASR task results on different datasets and metrics |