đ xls-asr-vi-40h
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common voice 7.0 vi & private dataset. It's designed for automatic speech recognition and can be used to transcribe Vietnamese speech.
đ Quick Start
Evaluation
To evaluate the model, you need to run the eval.py
file. Use the following command:
!python eval_custom.py --model_id geninhu/xls-asr-vi-40h --dataset mozilla-foundation/common_voice_7_0 --config vi --split test
⨠Features
- Based on a pre - trained model facebook/wav2vec2-xls-r-300m, fine - tuned on specific datasets.
- Achieved certain results on the evaluation set, such as a Wer of 60.58 (Without Language Model).
đ Documentation
Model Performance
It achieves the following results on the evaluation set (Without Language Model):
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
5e - 06 |
train_batch_size |
16 |
eval_batch_size |
8 |
seed |
42 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
1500 |
num_epochs |
50.0 |
mixed_precision_training |
Native AMP |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
23.3878 |
0.93 |
1500 |
21.9179 |
1.0 |
8.8862 |
1.85 |
3000 |
6.0599 |
1.0 |
4.3701 |
2.78 |
4500 |
4.3837 |
1.0 |
4.113 |
3.7 |
6000 |
4.2698 |
0.9982 |
3.9666 |
4.63 |
7500 |
3.9726 |
0.9989 |
3.5965 |
5.56 |
9000 |
3.7124 |
0.9975 |
3.3944 |
6.48 |
10500 |
3.5005 |
1.0057 |
3.304 |
7.41 |
12000 |
3.3710 |
1.0043 |
3.2482 |
8.33 |
13500 |
3.4201 |
1.0155 |
3.212 |
9.26 |
15000 |
3.3732 |
1.0151 |
3.1778 |
10.19 |
16500 |
3.2763 |
1.0009 |
3.1027 |
11.11 |
18000 |
3.1943 |
1.0025 |
2.9905 |
12.04 |
19500 |
2.8082 |
0.9703 |
2.7095 |
12.96 |
21000 |
2.4993 |
0.9302 |
2.4862 |
13.89 |
22500 |
2.3072 |
0.9140 |
2.3271 |
14.81 |
24000 |
2.1398 |
0.8949 |
2.1968 |
15.74 |
25500 |
2.0594 |
0.8817 |
2.111 |
16.67 |
27000 |
1.9404 |
0.8630 |
2.0387 |
17.59 |
28500 |
1.8895 |
0.8497 |
1.9504 |
18.52 |
30000 |
1.7961 |
0.8315 |
1.9039 |
19.44 |
31500 |
1.7433 |
0.8213 |
1.8342 |
20.37 |
33000 |
1.6790 |
0.7994 |
1.7824 |
21.3 |
34500 |
1.6291 |
0.7825 |
1.7359 |
22.22 |
36000 |
1.5783 |
0.7706 |
1.7053 |
23.15 |
37500 |
1.5248 |
0.7492 |
1.6504 |
24.07 |
39000 |
1.4930 |
0.7406 |
1.6263 |
25.0 |
40500 |
1.4572 |
0.7348 |
1.5893 |
25.93 |
42000 |
1.4202 |
0.7161 |
1.5669 |
26.85 |
43500 |
1.3987 |
0.7143 |
1.5277 |
27.78 |
45000 |
1.3512 |
0.6991 |
1.501 |
28.7 |
46500 |
1.3320 |
0.6879 |
1.4781 |
29.63 |
48000 |
1.3112 |
0.6788 |
1.4477 |
30.56 |
49500 |
1.2850 |
0.6657 |
1.4483 |
31.48 |
51000 |
1.2813 |
0.6633 |
1.4065 |
32.41 |
52500 |
1.2475 |
0.6541 |
1.3779 |
33.33 |
54000 |
1.2244 |
0.6503 |
1.3788 |
34.26 |
55500 |
1.2116 |
0.6407 |
1.3428 |
35.19 |
57000 |
1.1938 |
0.6352 |
1.3453 |
36.11 |
58500 |
1.1927 |
0.6340 |
1.3137 |
37.04 |
60000 |
1.1699 |
0.6252 |
1.2984 |
37.96 |
61500 |
1.1666 |
0.6229 |
1.2927 |
38.89 |
63000 |
1.1585 |
0.6188 |
1.2919 |
39.81 |
64500 |
1.1618 |
0.6190 |
1.293 |
40.74 |
66000 |
1.1479 |
0.6181 |
1.2853 |
41.67 |
67500 |
1.1423 |
0.6202 |
1.2687 |
42.59 |
69000 |
1.1315 |
0.6131 |
1.2603 |
43.52 |
70500 |
1.1333 |
0.6128 |
1.2577 |
44.44 |
72000 |
1.1191 |
0.6079 |
1.2435 |
45.37 |
73500 |
1.1177 |
0.6079 |
1.251 |
46.3 |
75000 |
1.1211 |
0.6092 |
1.2482 |
47.22 |
76500 |
1.1177 |
0.6060 |
1.2422 |
48.15 |
78000 |
1.1227 |
0.6097 |
1.2485 |
49.07 |
79500 |
1.1187 |
0.6071 |
1.2425 |
50.0 |
81000 |
1.1177 |
0.6058 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0
đ License
This model is licensed under the Apache - 2.0 license.