đ wav2vec2-xls-r-sl-a2
This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for automatic speech recognition in the Slovenian language. It provides high - quality speech recognition performance on multiple datasets.
đ Quick Start
Evaluation Commands
- Evaluate on mozilla - foundation/common_voice_8_0 with test split:
python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-sl-a2 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs
- Evaluate on speech - recognition - community - v2/dev_data:
Votic language not found in speech-recognition-community-v2/dev_data
⨠Features
- High - quality Speech Recognition: Achieves low WER and CER on multiple datasets, demonstrating excellent performance in automatic speech recognition.
- Fine - tuned for Slovenian: Specifically optimized for the Slovenian language, suitable for related speech recognition tasks.
đĻ Installation
The installation process is not provided in the original document, so this section is skipped.
đģ Usage Examples
The original document does not provide code examples, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Type |
wav2vec2 - xls - r - sl - a2 |
Training Data |
mozilla - foundation/common_voice_8_0 |
Evaluation Results
This model has achieved the following results on different datasets:
- Common Voice 8:
- Test WER: 0.21695212999560826
- Test CER: 0.052850080572474256
- Robust Speech Event - Dev Data (vot):
- Test WER: 0.560722380639029
- Test CER: 0.2279626093074681
- Robust Speech Event - Dev Data (sl):
- Robust Speech Event - Test Data (sl):
Training Hyperparameters
- Learning Rate: 7e - 05
- Train Batch Size: 32
- Eval Batch Size: 32
- Seed: 42
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- LR Scheduler Type: linear
- LR Scheduler Warmup Steps: 1000
- Number of Epochs: 100.0
- Mixed Precision Training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
6.9294 |
6.1 |
500 |
2.9712 |
1.0 |
2.8305 |
12.2 |
1000 |
1.7073 |
0.9479 |
1.4795 |
18.29 |
1500 |
0.5756 |
0.6397 |
1.3433 |
24.39 |
2000 |
0.4968 |
0.5424 |
1.1766 |
30.49 |
2500 |
0.4185 |
0.4743 |
1.0017 |
36.59 |
3000 |
0.3303 |
0.3578 |
0.9358 |
42.68 |
3500 |
0.3003 |
0.3051 |
0.8358 |
48.78 |
4000 |
0.3045 |
0.2884 |
0.7647 |
54.88 |
4500 |
0.2866 |
0.2677 |
0.7482 |
60.98 |
5000 |
0.2829 |
0.2585 |
0.6943 |
67.07 |
5500 |
0.2782 |
0.2478 |
0.6586 |
73.17 |
6000 |
0.2911 |
0.2537 |
0.6425 |
79.27 |
6500 |
0.2817 |
0.2462 |
0.6067 |
85.37 |
7000 |
0.2910 |
0.2436 |
0.5974 |
91.46 |
7500 |
0.2875 |
0.2430 |
0.5812 |
97.56 |
8000 |
0.2852 |
0.2396 |
Framework Versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
đ§ Technical Details
Training Process
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset. During training, a series of hyperparameters were used, including a learning rate of 7e - 05, a batch size of 32, and a linear learning rate scheduler with 1000 warm - up steps. The training lasted for 100 epochs, using Native AMP for mixed - precision training.
đ License
This model is released under the Apache 2.0 license.