đ wav2vec2-xls-r-300m-uk
This is a fine - tuned model for automatic speech recognition, achieving good results on the evaluation set.
đ Quick Start
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0927
- Wer: 0.1222
- Cer: 0.0204
⨠Features
- Automatic Speech Recognition: Specialized for automatic speech recognition tasks.
- Based on Common Voice: Trained with data from the Common Voice dataset.
đ Documentation
Training and Evaluation Data
More information needed
Training Procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 40
- eval_batch_size: 40
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 240
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 100
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Cer |
Validation Loss |
Wer |
9.0008 |
1.68 |
200 |
1.0 |
3.7590 |
1.0 |
3.4972 |
3.36 |
400 |
1.0 |
3.3933 |
1.0 |
3.3432 |
5.04 |
600 |
1.0 |
3.2617 |
1.0 |
3.2421 |
6.72 |
800 |
1.0 |
3.0712 |
1.0 |
1.9839 |
7.68 |
1000 |
0.1400 |
0.7204 |
0.6561 |
0.8017 |
9.36 |
1200 |
0.0766 |
0.3734 |
0.4159 |
0.5554 |
11.04 |
1400 |
0.0583 |
0.2621 |
0.3237 |
0.4309 |
12.68 |
1600 |
0.0486 |
0.2085 |
0.2753 |
0.3697 |
14.36 |
1800 |
0.0421 |
0.1746 |
0.2427 |
0.3293 |
16.04 |
2000 |
0.0388 |
0.1597 |
0.2243 |
0.2934 |
17.72 |
2200 |
0.0358 |
0.1428 |
0.2083 |
0.2704 |
19.4 |
2400 |
0.0333 |
0.1326 |
0.1949 |
0.2547 |
21.08 |
2600 |
0.0322 |
0.1255 |
0.1882 |
0.2366 |
22.76 |
2800 |
0.0309 |
0.1211 |
0.1815 |
0.2183 |
24.44 |
3000 |
0.0294 |
0.1159 |
0.1727 |
0.2115 |
26.13 |
3200 |
0.0280 |
0.1117 |
0.1661 |
0.1968 |
27.8 |
3400 |
0.0274 |
0.1063 |
0.1622 |
0.1922 |
29.48 |
3600 |
0.0269 |
0.1082 |
0.1598 |
0.1847 |
31.17 |
3800 |
0.0260 |
0.1061 |
0.1550 |
0.1715 |
32.84 |
4000 |
0.0252 |
0.1014 |
0.1496 |
0.1689 |
34.53 |
4200 |
0.0250 |
0.1012 |
0.1492 |
0.1655 |
36.21 |
4400 |
0.0243 |
0.0999 |
0.1450 |
0.1585 |
37.88 |
4600 |
0.0239 |
0.0967 |
0.1432 |
0.1492 |
39.57 |
4800 |
0.0237 |
0.0978 |
0.1421 |
0.1491 |
41.25 |
5000 |
0.0236 |
0.0963 |
0.1412 |
0.1453 |
42.93 |
5200 |
0.0230 |
0.0979 |
0.1373 |
0.1386 |
44.61 |
5400 |
0.0227 |
0.0959 |
0.1353 |
0.1387 |
46.29 |
5600 |
0.0226 |
0.0927 |
0.1355 |
0.1329 |
47.97 |
5800 |
0.0224 |
0.0951 |
0.1341 |
0.1295 |
49.65 |
6000 |
0.0219 |
0.0950 |
0.1306 |
0.1287 |
51.33 |
6200 |
0.0216 |
0.0937 |
0.1290 |
0.1277 |
53.02 |
6400 |
0.0215 |
0.0963 |
0.1294 |
0.1201 |
54.69 |
6600 |
0.0213 |
0.0959 |
0.1282 |
0.1199 |
56.38 |
6800 |
0.0215 |
0.0944 |
0.1286 |
0.1221 |
58.06 |
7000 |
0.0209 |
0.0938 |
0.1249 |
0.1145 |
59.68 |
7200 |
0.0208 |
0.0941 |
0.1254 |
0.1143 |
61.36 |
7400 |
0.0209 |
0.0941 |
0.1249 |
0.1143 |
63.04 |
7600 |
0.0209 |
0.0940 |
0.1248 |
0.1137 |
64.72 |
7800 |
0.0205 |
0.0931 |
0.1234 |
0.1125 |
66.4 |
8000 |
0.0204 |
0.0927 |
0.1222 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2
đ License
This project is licensed under the MIT license.
đ§ Technical Details
Model Index
- Name: wav2vec2-xls-r-300m-uk
- Results:
- Task:
- Name: Speech Recognition
- Type: automatic-speech-recognition
- Dataset:
- Name: Common Voice uk
- Type: common_voice
- Args: uk
- Metrics:
- Name: Test WER
- Type: wer
- Value: 12.22
Tags
- automatic-speech-recognition
- common_voice
- generated_from_trainer
Datasets