🚀 wav2vec2-large-xls-r-300m-br-d2
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BR dataset. It's designed for automatic speech recognition tasks, aiming to accurately transcribe speech in the Brazilian context.
✨ Features
- Fine - tuned Model: Based on the pre - trained
facebook/wav2vec2-xls-r-300m
, fine - tuned on the Brazilian Portuguese subset of the Common Voice 8.0 dataset.
- Evaluation Metrics: Achieves specific results on evaluation sets, including Loss and Wer metrics.
📚 Documentation
Model Information
Property |
Details |
Model Type |
Fine - tuned version of wav2vec2 - large - xls - r - 300m on Brazilian Portuguese dataset |
Training Data |
mozilla - foundation/common_voice_8_0 (BR) |
Metrics Used |
wer, cer |
Model Index
- Name: wav2vec2 - large - xls - r - 300m - br - d2
- Results:
- Task 1: Speech Recognition on Common Voice 8 (BR)
- Dataset: mozilla - foundation/common_voice_8_0 (BR)
- Metrics:
- Test WER: 0.49770598355954887
- Test CER: 0.18090500890299605
- Task 2: Automatic Speech Recognition on Robust Speech Event - Dev Data (BR)
- Dataset: speech - recognition - community - v2/dev_data (BR)
- Metrics:
- Test WER: NA
- Test CER: NA
Evaluation Results
This model achieves the following results on the evaluation set:
Evaluation Commands
- To evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-br-d2 --dataset mozilla-foundation/common_voice_8_0 --config br --split test --log_outputs
- To evaluate on speech - recognition - community - v2/dev_data
Breton language isn't available in speech - recognition - community - v2/dev_data
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00034
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 750
- num_epochs: 50
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
14.0379 |
0.68 |
100 |
5.6808 |
1.0 |
3.9145 |
1.35 |
200 |
3.1970 |
1.0 |
3.0293 |
2.03 |
300 |
2.9513 |
1.0 |
2.0927 |
2.7 |
400 |
1.4545 |
0.8887 |
1.1556 |
3.38 |
500 |
1.0966 |
0.7564 |
0.9628 |
4.05 |
600 |
0.9808 |
0.7364 |
0.7869 |
4.73 |
700 |
1.0488 |
0.7355 |
0.703 |
5.41 |
800 |
0.9500 |
0.6881 |
0.6657 |
6.08 |
900 |
0.9309 |
0.6259 |
0.5663 |
6.76 |
1000 |
0.9133 |
0.6357 |
0.496 |
7.43 |
1100 |
0.9890 |
0.6028 |
0.4748 |
8.11 |
1200 |
0.9469 |
0.5894 |
0.4135 |
8.78 |
1300 |
0.9270 |
0.6045 |
0.3579 |
9.46 |
1400 |
0.8818 |
0.5708 |
0.353 |
10.14 |
1500 |
0.9244 |
0.5781 |
0.334 |
10.81 |
1600 |
0.9009 |
0.5638 |
0.2917 |
11.49 |
1700 |
1.0132 |
0.5828 |
0.29 |
12.16 |
1800 |
0.9696 |
0.5668 |
0.2691 |
12.84 |
1900 |
0.9811 |
0.5455 |
0.25 |
13.51 |
2000 |
0.9951 |
0.5624 |
0.2467 |
14.19 |
2100 |
0.9653 |
0.5573 |
0.2242 |
14.86 |
2200 |
0.9714 |
0.5378 |
0.2066 |
15.54 |
2300 |
0.9829 |
0.5394 |
0.2075 |
16.22 |
2400 |
1.0547 |
0.5520 |
0.1923 |
16.89 |
2500 |
1.0014 |
0.5397 |
0.1919 |
17.57 |
2600 |
0.9978 |
0.5477 |
0.1908 |
18.24 |
2700 |
1.1064 |
0.5397 |
0.157 |
18.92 |
2800 |
1.0629 |
0.5238 |
0.159 |
19.59 |
2900 |
1.0642 |
0.5321 |
0.1652 |
20.27 |
3000 |
1.0207 |
0.5328 |
0.141 |
20.95 |
3100 |
0.9948 |
0.5312 |
0.1417 |
21.62 |
3200 |
1.0338 |
0.5328 |
0.1514 |
22.3 |
3300 |
1.0513 |
0.5313 |
0.1365 |
22.97 |
3400 |
1.0357 |
0.5291 |
0.1319 |
23.65 |
3500 |
1.0587 |
0.5167 |
0.1298 |
24.32 |
3600 |
1.0636 |
0.5236 |
0.1245 |
25.0 |
3700 |
1.1367 |
0.5280 |
0.1114 |
25.68 |
3800 |
1.0633 |
0.5200 |
0.1088 |
26.35 |
3900 |
1.0495 |
0.5210 |
0.1175 |
27.03 |
4000 |
1.0897 |
0.5095 |
0.1043 |
27.7 |
4100 |
1.0580 |
0.5309 |
0.0951 |
28.38 |
4200 |
1.0448 |
0.5067 |
0.1011 |
29.05 |
4300 |
1.0665 |
0.5137 |
0.0889 |
29.73 |
4400 |
1.0579 |
0.5026 |
0.0833 |
30.41 |
4500 |
1.0740 |
0.5037 |
0.0889 |
31.08 |
4600 |
1.0933 |
0.5083 |
0.0784 |
31.76 |
4700 |
1.0715 |
0.5089 |
0.0767 |
32.43 |
4800 |
1.0658 |
0.5049 |
0.0769 |
33.11 |
4900 |
1.1118 |
0.4979 |
0.0722 |
33.78 |
5000 |
1.1413 |
0.4986 |
0.0709 |
34.46 |
5100 |
1.0706 |
0.4885 |
0.0664 |
35.14 |
5200 |
1.1217 |
0.4884 |
0.0648 |
35.81 |
5300 |
1.1298 |
0.4941 |
0.0657 |
36.49 |
5400 |
1.1330 |
0.4920 |
0.0582 |
37.16 |
5500 |
1.0598 |
0.4835 |
0.0602 |
37.84 |
5600 |
1.1097 |
0.4943 |
0.0598 |
38.51 |
5700 |
1.0976 |
0.4876 |
0.0547 |
39.19 |
5800 |
1.0734 |
0.4825 |
0.0561 |
39.86 |
5900 |
1.0926 |
0.4850 |
0.0516 |
40.54 |
6000 |
1.1579 |
0.4751 |
0.0478 |
41.22 |
6100 |
1.1384 |
0.4706 |
0.0396 |
41.89 |
6200 |
1.1462 |
0.4739 |
0.0472 |
42.57 |
6300 |
1.1277 |
0.4732 |
0.0447 |
43.24 |
6400 |
1.1517 |
0.4752 |
0.0423 |
43.92 |
6500 |
1.1219 |
0.4784 |
0.0426 |
44.59 |
6600 |
1.1311 |
0.4724 |
0.0391 |
45.27 |
6700 |
1.1135 |
0.4692 |
0.0362 |
45.95 |
6800 |
1.0878 |
0.4645 |
0.0329 |
46.62 |
6900 |
1.1137 |
0.4668 |
0.0356 |
47.3 |
7000 |
1.1233 |
0.4687 |
0.0328 |
47.97 |
7100 |
1.1238 |
0.4653 |
0.0323 |
48.65 |
7200 |
1.1307 |
0.4646 |
0.0325 |
49.32 |
7300 |
1.1242 |
0.4645 |
0.03 |
50.0 |
7400 |
1.1257 |
0.4631 |
Framework Versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.0
📄 License
This model is released under the Apache - 2.0 license.