đ xlsr-wav2vec2-2
This is a fine - tuned model based on the Transformer architecture, which can achieve good performance in speech - related tasks.
đ Quick Start
This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the None dataset. It achieves the following results on the evaluation set:
đ§ Technical Details
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 800
- num_epochs: 60
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
6.6058 |
1.38 |
400 |
3.1894 |
1.0 |
2.3145 |
2.76 |
800 |
0.7193 |
0.7976 |
0.6737 |
4.14 |
1200 |
0.5338 |
0.6056 |
0.4651 |
5.52 |
1600 |
0.5699 |
0.6007 |
0.3968 |
6.9 |
2000 |
0.4608 |
0.5221 |
0.3281 |
8.28 |
2400 |
0.5264 |
0.5209 |
0.2937 |
9.65 |
2800 |
0.5366 |
0.5096 |
0.2619 |
11.03 |
3200 |
0.4902 |
0.5021 |
0.2394 |
12.41 |
3600 |
0.4706 |
0.4908 |
0.2139 |
13.79 |
4000 |
0.5526 |
0.4871 |
0.2034 |
15.17 |
4400 |
0.5396 |
0.5108 |
0.1946 |
16.55 |
4800 |
0.4959 |
0.4866 |
0.1873 |
17.93 |
5200 |
0.4898 |
0.4877 |
0.1751 |
19.31 |
5600 |
0.5488 |
0.4932 |
0.1668 |
20.69 |
6000 |
0.5645 |
0.4986 |
0.1638 |
22.07 |
6400 |
0.5367 |
0.4946 |
0.1564 |
23.45 |
6800 |
0.5282 |
0.4898 |
0.1566 |
24.83 |
7200 |
0.5489 |
0.4841 |
0.1522 |
26.21 |
7600 |
0.5439 |
0.4821 |
0.1378 |
27.59 |
8000 |
0.5796 |
0.4866 |
0.1459 |
28.96 |
8400 |
0.5603 |
0.4875 |
0.1406 |
30.34 |
8800 |
0.6773 |
0.5005 |
0.1298 |
31.72 |
9200 |
0.5858 |
0.4827 |
0.1268 |
33.1 |
9600 |
0.6007 |
0.4790 |
0.1204 |
34.48 |
10000 |
0.5716 |
0.4734 |
0.113 |
35.86 |
10400 |
0.5866 |
0.4748 |
0.1088 |
37.24 |
10800 |
0.5790 |
0.4752 |
0.1074 |
38.62 |
11200 |
0.5966 |
0.4721 |
0.1018 |
40.0 |
11600 |
0.5720 |
0.4668 |
0.0968 |
41.38 |
12000 |
0.5826 |
0.4698 |
0.0874 |
42.76 |
12400 |
0.5937 |
0.4634 |
0.0843 |
44.14 |
12800 |
0.6056 |
0.4640 |
0.0822 |
45.52 |
13200 |
0.5531 |
0.4569 |
0.0806 |
46.9 |
13600 |
0.5669 |
0.4484 |
0.072 |
48.28 |
14000 |
0.5683 |
0.4484 |
0.0734 |
49.65 |
14400 |
0.5735 |
0.4437 |
0.0671 |
51.03 |
14800 |
0.5455 |
0.4394 |
0.0617 |
52.41 |
15200 |
0.5838 |
0.4365 |
0.0607 |
53.79 |
15600 |
0.6233 |
0.4397 |
0.0593 |
55.17 |
16000 |
0.5649 |
0.4340 |
0.0551 |
56.55 |
16400 |
0.5923 |
0.4392 |
0.0503 |
57.93 |
16800 |
0.5858 |
0.4325 |
0.0496 |
59.31 |
17200 |
0.5884 |
0.4301 |
Framework versions
- Transformers 4.19.2
- Pytorch 1.11.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1
đ License
This project is licensed under the Apache-2.0 license.