đ wav2vec2-large-xls-r-300m-hsb-v2
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. It is designed for automatic speech recognition tasks, aiming to accurately transcribe speech in the Upper Sorbian (hsb) language.
⨠Features
- Multilingual Adaptation: Based on the large - scale wav2vec2 - xls - r - 300m model, it can be well - adapted to the HSB language.
- High - quality Performance: Achieves relatively low WER and CER on the evaluation set, indicating high recognition accuracy.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Evaluation Results
This model achieves the following results on the evaluation set:
Evaluation Commands
- To evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hsb-v2 --dataset mozilla-foundation/common_voice_8_0 --config hsb --split test --log_outputs
- To evaluate on speech - recognition - community - v2/dev_data
Upper Sorbian (hsb) not found in speech - recognition - community - v2/dev_data
Training Hyperparameters
The following hyperparameters were used during training:
Property |
Details |
Learning Rate |
0.00045 |
Train Batch Size |
16 |
Eval Batch Size |
8 |
Seed |
42 |
Gradient Accumulation Steps |
2 |
Total Train Batch Size |
32 |
Optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
LR Scheduler Type |
linear |
LR Scheduler Warmup Steps |
500 |
Num Epochs |
50 |
Mixed Precision Training |
Native AMP |
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
8.5979 |
3.23 |
100 |
3.5602 |
1.0 |
3.303 |
6.45 |
200 |
3.2238 |
1.0 |
3.2034 |
9.68 |
300 |
3.2002 |
0.9888 |
2.7986 |
12.9 |
400 |
1.2408 |
0.9210 |
1.3869 |
16.13 |
500 |
0.7973 |
0.7462 |
1.0228 |
19.35 |
600 |
0.6722 |
0.6788 |
0.8311 |
22.58 |
700 |
0.6100 |
0.6150 |
0.717 |
25.81 |
800 |
0.6236 |
0.6013 |
0.6264 |
29.03 |
900 |
0.6031 |
0.5575 |
0.5494 |
32.26 |
1000 |
0.5656 |
0.5309 |
0.4781 |
35.48 |
1100 |
0.5289 |
0.4996 |
0.4311 |
38.71 |
1200 |
0.5375 |
0.4768 |
0.3902 |
41.94 |
1300 |
0.5246 |
0.4703 |
0.3508 |
45.16 |
1400 |
0.5382 |
0.4696 |
0.3199 |
48.39 |
1500 |
0.5328 |
0.4596 |
Framework Versions
- Transformers 4.16.1
- Pytorch 1.10.0+cu111
- Datasets 1.18.2
- Tokenizers 0.11.0
đ§ Technical Details
The model is fine - tuned from the pre - trained wav2vec2 - xls - r - 300m model on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. Through specific hyperparameter settings and training processes, it can adapt to the characteristics of the HSB language and achieve good speech recognition performance.
đ License
This model is released under the Apache 2.0 license.