đ XLS-R-300M - Swedish - CV7 - v2
This is a fine - tuned model for automatic speech recognition on the Swedish language, based on the facebook/wav2vec2 - xls - r - 300m model, achieving good results on the evaluation set.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - SV - SE dataset. It achieves the following results on the evaluation set:
⨠Features
- Automatic Speech Recognition: Specialized for Swedish speech recognition tasks.
- Fine - Tuned: Based on a pre - trained model and fine - tuned on specific Swedish datasets.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Type |
XLS - R - 300M - Swedish - CV7 - v2 |
Training Data |
MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - SV - SE |
Results on Evaluation Set |
Loss: 0.2604; Wer: 0.2334 |
Model Index
- Name: XLS - R - 300M - Swedish - CV7 - v2
- Results:
- Task 1:
- Task Name: Automatic Speech Recognition
- Dataset: Common Voice 7 (mozilla - foundation/common_voice_7_0, args: sv - SE)
- Metrics:
- Test WER: 15.99
- Test CER: 5.2
- Task 2:
- Task Name: Automatic Speech Recognition
- Dataset: Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: sv)
- Metrics:
- Test WER: 24.41
- Test CER: 11.88
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e - 05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi - GPU
- num_devices: 8
- gradient_accumulation_steps: 1
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Training Results
See Tensorboard
Evaluation Commands
- To evaluate on
mozilla - foundation/common_voice_7_0
with split test
python eval.py --model_id patrickvonplaten/xls - r - 300 - sv - cv7 --dataset mozilla - foundation/common_voice_7_0 --config sv - SE --split test
- To evaluate on
speech - recognition - community - v2/dev_data
python eval.py --model_id patrickvonplaten/xls - r - 300 - sv - cv7 --dataset speech - recognition - community - v2/dev_data --config sv --split validation --chunk_length_s 5.0 --stride_length_s 1.0
Framework Versions
- Transformers 4.17.0.dev0
- Pytorch 1.9.0+cu111
- Datasets 1.18.4.dev0
- Tokenizers 0.10.3
đ License
This model is licensed under the Apache - 2.0 license.