đ XLS-R 300M PT Model
This is an XLS-R 300M model fine - tuned on the Portuguese dataset of Mozilla Foundation's Common Voice 8.0. It is designed for automatic speech recognition tasks and has achieved competitive results on multiple datasets.
đ Quick Start
This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - PT dataset.
It achieves the following results on the evaluation set:
⨠Features
- Multilingual Adaptability: Based on the XLS - R architecture, it can potentially adapt to multiple languages.
- Fine - Tuned for Portuguese: Specifically optimized for Portuguese speech recognition using the Common Voice 8.0 dataset.
- Competitive Metrics: Achieved good WER and CER scores on both Common Voice 8.0 and Robust Speech Event datasets.
đ Documentation
Model Index
Model Name |
Task |
Dataset |
Metrics |
xls - r - 300m - pt |
Speech Recognition (automatic - speech - recognition) |
Common Voice 8.0 pt (mozilla - foundation/common_voice_8_0, args: pt) |
Test WER: 19.361 Test CER: 5.533 |
xls - r - 300m - pt |
Speech Recognition (automatic - speech - recognition) |
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: fr) |
Validation WER: 47.812 Validation CER: 18.805 |
xls - r - 300m - pt |
Automatic Speech Recognition (automatic - speech - recognition) |
Common Voice 8.0 (mozilla - foundation/common_voice_8_0, args: pt) |
Test WER: 19.36 |
xls - r - 300m - pt |
Automatic Speech Recognition (automatic - speech - recognition) |
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: pt) |
Test WER: 48.01 |
xls - r - 300m - pt |
Automatic Speech Recognition (automatic - speech - recognition) |
Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data, args: pt) |
Test WER: 49.21 |
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1500
- num_epochs: 15.0
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
3.0952 |
0.64 |
500 |
3.0982 |
1.0 |
1.7975 |
1.29 |
1000 |
0.7887 |
0.5651 |
1.4138 |
1.93 |
1500 |
0.5238 |
0.4389 |
1.344 |
2.57 |
2000 |
0.4775 |
0.4318 |
1.2737 |
3.21 |
2500 |
0.4648 |
0.4075 |
1.2554 |
3.86 |
3000 |
0.4069 |
0.3678 |
1.1996 |
4.5 |
3500 |
0.3914 |
0.3668 |
1.1427 |
5.14 |
4000 |
0.3694 |
0.3572 |
1.1372 |
5.78 |
4500 |
0.3568 |
0.3501 |
1.0831 |
6.43 |
5000 |
0.3331 |
0.3253 |
1.1074 |
7.07 |
5500 |
0.3332 |
0.3352 |
1.0536 |
7.71 |
6000 |
0.3131 |
0.3152 |
1.0248 |
8.35 |
6500 |
0.3024 |
0.3023 |
1.0075 |
9.0 |
7000 |
0.2948 |
0.3028 |
0.979 |
9.64 |
7500 |
0.2796 |
0.2853 |
0.9594 |
10.28 |
8000 |
0.2719 |
0.2789 |
0.9172 |
10.93 |
8500 |
0.2620 |
0.2695 |
0.9047 |
11.57 |
9000 |
0.2537 |
0.2596 |
0.8777 |
12.21 |
9500 |
0.2438 |
0.2525 |
0.8629 |
12.85 |
10000 |
0.2409 |
0.2493 |
0.8575 |
13.5 |
10500 |
0.2366 |
0.2440 |
0.8361 |
14.14 |
11000 |
0.2317 |
0.2385 |
0.8126 |
14.78 |
11500 |
0.2290 |
0.2382 |
Framework Versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
đ License
This model is released under the Apache 2.0 license.