🚀 wav2vec2-large-xls-r-300m-sl-with-LM-v1
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset. It is designed for automatic speech recognition, aiming to provide accurate speech - to - text conversion.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model Information
Property |
Details |
Language |
Slovenian (sl) |
License |
Apache - 2.0 |
Tags |
automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, model_for_talk, mozilla - foundation/common_voice_8_0, robust - speech - event, sl |
Datasets |
mozilla - foundation/common_voice_8_0 |
Model Results
The model wav2vec2-large-xls-r-300m-sl-with-LM-v1
has the following evaluation results:
- Common Voice 8 (Test Data):
- Test WER: 0.20626555409164105
- Test CER: 0.051648321634392154
- Test WER (+LM): 0.13482652613087395
- Test CER (+LM): 0.038838663862562475
- Robust Speech Event - Dev Data:
- Dev WER: 0.5406156320830592
- Dev CER: 0.22249723590310583
- Dev WER (+LM): 0.49783147459727384
- Dev CER (+LM): 0.1591062599627158
- Robust Speech Event - Test Data:
Evaluation Commands
- To evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v1 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs
- To evaluate on speech - recognition - community - v2/dev_data
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v1 --dataset speech-recognition-community-v2/dev_data --config sl --split validation --chunk_length_s 10 --stride_length_s 1
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.1e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 100.0
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
3.3881 |
6.1 |
500 |
2.9710 |
1.0 |
2.6401 |
12.2 |
1000 |
1.7677 |
0.9734 |
1.5152 |
18.29 |
1500 |
0.5564 |
0.6011 |
1.2191 |
24.39 |
2000 |
0.4319 |
0.4390 |
1.0237 |
30.49 |
2500 |
0.3141 |
0.3175 |
0.8892 |
36.59 |
3000 |
0.2748 |
0.2689 |
0.8296 |
42.68 |
3500 |
0.2680 |
0.2534 |
0.7602 |
48.78 |
4000 |
0.2820 |
0.2506 |
0.7186 |
54.88 |
4500 |
0.2672 |
0.2398 |
0.6887 |
60.98 |
5000 |
0.2729 |
0.2402 |
0.6507 |
67.07 |
5500 |
0.2767 |
0.2361 |
0.6226 |
73.17 |
6000 |
0.2817 |
0.2332 |
0.6024 |
79.27 |
6500 |
0.2679 |
0.2279 |
0.5787 |
85.37 |
7000 |
0.2837 |
0.2316 |
0.5744 |
91.46 |
7500 |
0.2838 |
0.2284 |
0.5556 |
97.56 |
8000 |
0.2763 |
0.2281 |
Framework Versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
📄 License
This model is released under the Apache - 2.0 license.