🚀 wav2vec2-large-xls-r-300m-sl-with-LM-v2
This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for automatic speech recognition in the Slovenian language. It is trained on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset, achieving excellent performance on multiple evaluation metrics.
✨ Features
- Automatic Speech Recognition: Specialized for Slovenian speech recognition.
- Trained on High - Quality Dataset: Utilizes the Mozilla Foundation's Common Voice 8.0 dataset.
- Robust Performance: Demonstrates good results on both standard and robust speech event datasets.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
The following are the evaluation commands for different datasets:
Evaluate on mozilla - foundation/common_voice_8_0
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v2 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs
Evaluate on speech - recognition - community - v2/dev_data
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v2 --dataset speech-recognition-community-v2/dev_data --config sl --split validation --chunk_length_s 10 --stride_length_s 1
📚 Documentation
Model Information
Property |
Details |
Model Type |
wav2vec2-large-xls-r-300m-sl-with-LM-v2 |
Training Datasets |
mozilla-foundation/common_voice_8_0 |
Languages |
Slovenian (sl) |
Evaluation Results
The model has the following evaluation results:
- Common Voice 8:
- Test WER: 0.21695212999560826
- Test CER: 0.052850080572474256
- Test WER (+LM): 0.14551310203484116
- Test CER (+LM): 0.03927566711277415
- Robust Speech Event - Dev Data:
- Dev WER: 0.560722380639029
- Dev CER: 0.2279626093074681
- Dev WER (+LM): 0.46486802661402354
- Dev CER (+LM): 0.21105136194592422
- Robust Speech Event - Test Data:
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 100.0
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
6.9294 |
6.1 |
500 |
2.9712 |
1.0 |
2.8305 |
12.2 |
1000 |
1.7073 |
0.9479 |
1.4795 |
18.29 |
1500 |
0.5756 |
0.6397 |
1.3433 |
24.39 |
2000 |
0.4968 |
0.5424 |
1.1766 |
30.49 |
2500 |
0.4185 |
0.4743 |
1.0017 |
36.59 |
3000 |
0.3303 |
0.3578 |
0.9358 |
42.68 |
3500 |
0.3003 |
0.3051 |
0.8358 |
48.78 |
4000 |
0.3045 |
0.2884 |
0.7647 |
54.88 |
4500 |
0.2866 |
0.2677 |
0.7482 |
60.98 |
5000 |
0.2829 |
0.2585 |
0.6943 |
67.07 |
5500 |
0.2782 |
0.2478 |
0.6586 |
73.17 |
6000 |
0.2911 |
0.2537 |
0.6425 |
79.27 |
6500 |
0.2817 |
0.2462 |
0.6067 |
85.37 |
7000 |
0.2910 |
0.2436 |
0.5974 |
91.46 |
7500 |
0.2875 |
0.2430 |
0.5812 |
97.56 |
8000 |
0.2852 |
0.2396 |
Framework Versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
📄 License
This model is licensed under the Apache - 2.0 license.