🚀 wav2vec2-common_voice-tr-demo-dist
This model is a fine - tuned version of facebook/wav2vec2-large-xlsr-53 on the COMMON_VOICE - TR dataset. It offers high - performance automatic speech recognition capabilities. On the evaluation set, it achieves the following results:
📚 Documentation
Model Information
Property |
Details |
Model Type |
Fine - tuned version of facebook/wav2vec2-large-xlsr-53 on the COMMON_VOICE - TR dataset |
Tags |
automatic - speech - recognition, common_voice, generated_from_trainer |
Datasets |
common_voice |
License
This project is licensed under the Apache - 2.0 license.
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi - GPU
- num_devices: 2
- total_train_batch_size: 8
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 15.0
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
3.5459 |
0.23 |
100 |
3.6773 |
1.0 |
3.2247 |
0.46 |
200 |
3.1491 |
0.9999 |
2.3457 |
0.69 |
300 |
2.4236 |
1.0041 |
0.9149 |
0.92 |
400 |
0.9471 |
0.7684 |
0.6622 |
1.15 |
500 |
0.7518 |
0.6863 |
0.7205 |
1.38 |
600 |
0.6387 |
0.6402 |
0.6978 |
1.61 |
700 |
0.5611 |
0.5739 |
0.5317 |
1.84 |
800 |
0.5061 |
0.5418 |
0.5222 |
2.07 |
900 |
0.4839 |
0.5344 |
0.4467 |
2.3 |
1000 |
0.5060 |
0.5339 |
0.3196 |
2.53 |
1100 |
0.4619 |
0.5213 |
0.276 |
2.76 |
1200 |
0.4595 |
0.5020 |
0.3569 |
2.99 |
1300 |
0.4339 |
0.4901 |
0.2236 |
3.22 |
1400 |
0.4602 |
0.4887 |
0.293 |
3.45 |
1500 |
0.4376 |
0.4639 |
0.1677 |
3.68 |
1600 |
0.4371 |
0.4605 |
0.1838 |
3.91 |
1700 |
0.4116 |
0.4589 |
0.1225 |
4.14 |
1800 |
0.4144 |
0.4495 |
0.2301 |
4.37 |
1900 |
0.4250 |
0.4567 |
0.1931 |
4.6 |
2000 |
0.4081 |
0.4470 |
0.1427 |
4.83 |
2100 |
0.4295 |
0.4482 |
0.361 |
5.06 |
2200 |
0.4374 |
0.4445 |
0.3272 |
5.29 |
2300 |
0.4088 |
0.4258 |
0.3686 |
5.52 |
2400 |
0.4087 |
0.4258 |
0.3087 |
5.75 |
2500 |
0.4100 |
0.4371 |
0.4637 |
5.98 |
2600 |
0.4038 |
0.4219 |
0.1485 |
6.21 |
2700 |
0.4361 |
0.4197 |
0.1341 |
6.44 |
2800 |
0.4217 |
0.4132 |
0.1185 |
6.67 |
2900 |
0.4244 |
0.4097 |
0.1588 |
6.9 |
3000 |
0.4212 |
0.4181 |
0.0697 |
7.13 |
3100 |
0.3981 |
0.4073 |
0.0491 |
7.36 |
3200 |
0.3992 |
0.4010 |
0.088 |
7.59 |
3300 |
0.4206 |
0.4022 |
0.0731 |
7.82 |
3400 |
0.3998 |
0.3841 |
0.2767 |
8.05 |
3500 |
0.4195 |
0.3829 |
0.1725 |
8.28 |
3600 |
0.4167 |
0.3946 |
0.1242 |
8.51 |
3700 |
0.4177 |
0.3821 |
0.1133 |
8.74 |
3800 |
0.3993 |
0.3802 |
0.1952 |
8.97 |
3900 |
0.4132 |
0.3904 |
0.1399 |
9.2 |
4000 |
0.4010 |
0.3795 |
0.047 |
9.43 |
4100 |
0.4128 |
0.3703 |
0.049 |
9.66 |
4200 |
0.4319 |
0.3670 |
0.0994 |
9.89 |
4300 |
0.4118 |
0.3631 |
0.1209 |
10.11 |
4400 |
0.4296 |
0.3722 |
0.0484 |
10.34 |
4500 |
0.4130 |
0.3615 |
0.2065 |
10.57 |
4600 |
0.3958 |
0.3668 |
0.133 |
10.8 |
4700 |
0.4102 |
0.3679 |
0.0622 |
11.03 |
4800 |
0.4137 |
0.3585 |
0.0999 |
11.26 |
4900 |
0.4042 |
0.3583 |
0.0346 |
11.49 |
5000 |
0.4183 |
0.3573 |
0.072 |
11.72 |
5100 |
0.4060 |
0.3530 |
0.0365 |
11.95 |
5200 |
0.3968 |
0.3483 |
0.0615 |
12.18 |
5300 |
0.3958 |
0.3485 |
0.1067 |
12.41 |
5400 |
0.3987 |
0.3453 |
0.0253 |
12.64 |
5500 |
0.4182 |
0.3405 |
0.0636 |
12.87 |
5600 |
0.4199 |
0.3458 |
0.0506 |
13.1 |
5700 |
0.4056 |
0.3412 |
0.0944 |
13.33 |
5800 |
0.4061 |
0.3381 |
0.1187 |
13.56 |
5900 |
0.4113 |
0.3381 |
0.0237 |
13.79 |
6000 |
0.3973 |
0.3343 |
0.0166 |
14.02 |
6100 |
0.4001 |
0.3357 |
0.1189 |
14.25 |
6200 |
0.3931 |
0.3315 |
0.0375 |
14.48 |
6300 |
0.3944 |
0.3329 |
0.0537 |
14.71 |
6400 |
0.3953 |
0.3308 |
0.045 |
14.94 |
6500 |
0.3933 |
0.3303 |
Framework Versions
- Transformers 4.18.0
- Pytorch 1.9.1+cu102
- Datasets 1.13.3
- Tokenizers 0.11.6