đ Wav2Vec2 XLS-R 300M Korean
Wav2Vec2 XLS-R 300M Korean is an automatic speech recognition model based on the XLS-R architecture. It fine - tunes Wav2Vec2 - XLS - R - 300M on the Zeroth Korean dataset, offering a solution for Korean automatic speech recognition tasks.
đ Quick Start
This model is trained using HuggingFace's PyTorch framework and is part of the Robust Speech Challenge Event organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH. All necessary scripts used for training could be found in the Files and versions tab, as well as the Training metrics logged via Tensorboard.
⨠Features
- Based on the advanced XLS - R architecture, suitable for automatic speech recognition tasks.
- Fine - tuned on the Zeroth Korean dataset, optimized for the Korean language.
đĻ Installation
No installation steps were provided in the original README.
đģ Usage Examples
No code examples were provided in the original README.
đ Documentation
Model
Property |
Details |
Model Type |
wav2vec2-xls-r-300m-korean |
#params |
300M |
Arch. |
XLS - R |
Training/Validation data (text) |
Zeroth Korean Dataset |
Evaluation Results
The model achieves the following results on evaluation:
Dataset |
Loss |
WER |
CER |
Zeroth Korean |
0.2089 |
29.54% |
9.53% |
Robust Speech Event - Dev Data |
N/A |
76.26% |
38.67% |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
learning_rate
: 7.5e - 05
train_batch_size
: 8
eval_batch_size
: 8
seed
: 42
gradient_accumulation_steps
: 4
total_train_batch_size
: 32
optimizer
: Adam with betas=(0.9, 0.999)
and epsilon = 1e - 08
lr_scheduler_type
: linear
lr_scheduler_warmup_steps
: 2000
num_epochs
: 50.0
mixed_precision_training
: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
Cer |
19.7138 |
0.72 |
500 |
19.6427 |
1.0 |
1.0 |
4.8039 |
1.44 |
1000 |
4.7842 |
1.0 |
1.0 |
4.5619 |
2.16 |
1500 |
4.5608 |
0.9992 |
0.9598 |
4.254 |
2.88 |
2000 |
4.2729 |
0.9955 |
0.9063 |
4.1905 |
3.6 |
2500 |
4.2257 |
0.9903 |
0.8758 |
4.0683 |
4.32 |
3000 |
3.9294 |
0.9937 |
0.7911 |
3.486 |
5.04 |
3500 |
2.7045 |
1.0012 |
0.5934 |
2.946 |
5.75 |
4000 |
1.9691 |
0.9425 |
0.4634 |
2.634 |
6.47 |
4500 |
1.5212 |
0.8807 |
0.3850 |
2.4066 |
7.19 |
5000 |
1.2551 |
0.8177 |
0.3601 |
2.2651 |
7.91 |
5500 |
1.0423 |
0.7650 |
0.3039 |
2.1828 |
8.63 |
6000 |
0.9599 |
0.7273 |
0.3106 |
2.1023 |
9.35 |
6500 |
0.9482 |
0.7161 |
0.3063 |
2.0536 |
10.07 |
7000 |
0.8242 |
0.6767 |
0.2860 |
1.9803 |
10.79 |
7500 |
0.7643 |
0.6563 |
0.2637 |
1.9468 |
11.51 |
8000 |
0.7319 |
0.6441 |
0.2505 |
1.9178 |
12.23 |
8500 |
0.6937 |
0.6320 |
0.2489 |
1.8515 |
12.95 |
9000 |
0.6443 |
0.6053 |
0.2196 |
1.8083 |
13.67 |
9500 |
0.6286 |
0.6122 |
0.2148 |
1.819 |
14.39 |
10000 |
0.6015 |
0.5986 |
0.2074 |
1.7684 |
15.11 |
10500 |
0.5682 |
0.5741 |
0.1982 |
1.7195 |
15.83 |
11000 |
0.5385 |
0.5592 |
0.2007 |
1.7044 |
16.55 |
11500 |
0.5362 |
0.5524 |
0.2097 |
1.6879 |
17.27 |
12000 |
0.5119 |
0.5489 |
0.2083 |
1.656 |
17.98 |
12500 |
0.4990 |
0.5362 |
0.1968 |
1.6122 |
18.7 |
13000 |
0.4561 |
0.5092 |
0.1900 |
1.5919 |
19.42 |
13500 |
0.4778 |
0.5225 |
0.1975 |
1.5896 |
20.14 |
14000 |
0.4563 |
0.5098 |
0.1859 |
1.5589 |
20.86 |
14500 |
0.4362 |
0.4940 |
0.1725 |
1.5353 |
21.58 |
15000 |
0.4140 |
0.4826 |
0.1580 |
1.5441 |
22.3 |
15500 |
0.4031 |
0.4742 |
0.1550 |
1.5116 |
23.02 |
16000 |
0.3916 |
0.4748 |
0.1545 |
1.4731 |
23.74 |
16500 |
0.3841 |
0.4810 |
0.1542 |
1.4647 |
24.46 |
17000 |
0.3752 |
0.4524 |
0.1475 |
1.4328 |
25.18 |
17500 |
0.3587 |
0.4476 |
0.1461 |
1.4129 |
25.9 |
18000 |
0.3429 |
0.4242 |
0.1366 |
1.4062 |
26.62 |
18500 |
0.3450 |
0.4251 |
0.1355 |
1.3928 |
27.34 |
19000 |
0.3297 |
0.4145 |
0.1322 |
1.3906 |
28.06 |
19500 |
0.3210 |
0.4185 |
0.1336 |
1.358 |
28.78 |
20000 |
0.3131 |
0.3970 |
0.1275 |
1.3445 |
29.5 |
20500 |
0.3069 |
0.3920 |
0.1276 |
1.3159 |
30.22 |
21000 |
0.3035 |
0.3961 |
0.1255 |
1.3044 |
30.93 |
21500 |
0.2952 |
0.3854 |
0.1242 |
1.3034 |
31.65 |
22000 |
0.2966 |
0.3772 |
0.1227 |
1.2963 |
32.37 |
22500 |
0.2844 |
0.3706 |
0.1208 |
1.2765 |
33.09 |
23000 |
0.2841 |
0.3567 |
0.1173 |
1.2438 |
33.81 |
23500 |
0.2734 |
0.3552 |
0.1137 |
1.2487 |
34.53 |
24000 |
0.2703 |
0.3502 |
0.1118 |
1.2249 |
35.25 |
24500 |
0.2650 |
0.3484 |
0.1142 |
1.2229 |
35.97 |
25000 |
0.2584 |
0.3374 |
0.1097 |
1.2374 |
36.69 |
25500 |
0.2568 |
0.3337 |
0.1095 |
1.2153 |
37.41 |
26000 |
0.2494 |
0.3327 |
0.1071 |
1.1925 |
38.13 |
26500 |
0.2518 |
0.3366 |
0.1077 |
1.1908 |
38.85 |
27000 |
0.2437 |
0.3272 |
0.1057 |
1.1858 |
39.57 |
27500 |
0.2396 |
0.3265 |
0.1044 |
1.1808 |
40.29 |
28000 |
0.2373 |
0.3156 |
0.1028 |
1.1842 |
41.01 |
28500 |
0.2356 |
0.3152 |
0.1026 |
1.1668 |
41.73 |
29000 |
0.2319 |
0.3188 |
0.1025 |
1.1448 |
42.45 |
29500 |
0.2293 |
0.3099 |
0.0995 |
1.1327 |
43.17 |
30000 |
0.2265 |
0.3047 |
0.0979 |
1.1307 |
43.88 |
30500 |
0.2222 |
0.3078 |
0.0989 |
1.1419 |
44.6 |
31000 |
0.2215 |
0.3038 |
0.0981 |
1.1231 |
45.32 |
31500 |
0.2193 |
0.3013 |
0.0972 |
1.139 |
46.04 |
32000 |
0.2162 |
0.3007 |
0.0968 |
1.1114 |
46.76 |
32500 |
0.2122 |
0.2982 |
0.0960 |
1.111 |
47.48 |
33000 |
0.2125 |
0.2946 |
0.0948 |
1.0982 |
48.2 |
33500 |
0.2099 |
0.2957 |
0.0953 |
1.109 |
48.92 |
34000 |
0.2092 |
0.2955 |
0.0955 |
1.0905 |
49.64 |
34500 |
0.2088 |
0.2954 |
0.0953 |
Disclaimer
â ī¸ Important Note
Do consider the biases which came from pre - training datasets that may be carried over into the results of this model.
Authors
Wav2Vec2 XLS - R 300M Korean was trained and evaluated by Wilson Wongso. All computation and development are done on OVH Cloud.
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.10.3
đ License
This model is licensed under the Apache 2.0 license.