đ wav2vec2-large-xls-r-300m-bg-d2
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BG dataset, aiming to provide high - quality automatic speech recognition for Bulgarian.
⨠Features
- Multilingual Adaptability: Based on the pre - trained model
wav2vec2-xls-r-300m
, it has strong adaptability to different languages.
- High - Precision Recognition: Achieved low Word Error Rate (WER) and Character Error Rate (CER) on the evaluation set, demonstrating excellent recognition performance.
đĻ Installation
There is no specific installation content provided in the original document, so this section is skipped.
đģ Usage Examples
There is no code example provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Language |
Bulgarian (bg) |
License |
Apache - 2.0 |
Tags |
automatic - speech - recognition, bg, generated_from_trainer, hf - asr - leaderboard, mozilla - foundation/common_voice_8_0, robust - speech - event |
Datasets |
mozilla - foundation/common_voice_8_0 |
Model Evaluation Results
- Common Voice 8 Dataset:
- Test WER: 0.28775471338792613
- Test CER: 0.06861971204625049
- Robust Speech Event - Dev Data:
- Test WER: 0.49783147459727384
- Test CER: 0.1591062599627158
- Robust Speech Event - Test Data:
Evaluation Commands
- To evaluate on
mozilla - foundation/common_voice_8_0
with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-d2 --dataset mozilla-foundation/common_voice_8_0 --config bg --split test --log_outputs
- To evaluate on
speech - recognition - community - v2/dev_data
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-d2 --dataset speech-recognition-community-v2/dev_data --config bg --split validation --chunk_length_s 10 --stride_length_s 1
Training Hyperparameters
- learning_rate: 0.00025
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 700
- num_epochs: 35
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
6.8791 |
1.74 |
200 |
3.1902 |
1.0 |
3.0441 |
3.48 |
400 |
2.8098 |
0.9864 |
1.1499 |
5.22 |
600 |
0.4668 |
0.5014 |
0.4968 |
6.96 |
800 |
0.4162 |
0.4472 |
0.3553 |
8.7 |
1000 |
0.3580 |
0.3777 |
0.3027 |
10.43 |
1200 |
0.3422 |
0.3506 |
0.2562 |
12.17 |
1400 |
0.3556 |
0.3639 |
0.2272 |
13.91 |
1600 |
0.3621 |
0.3583 |
0.2125 |
15.65 |
1800 |
0.3436 |
0.3358 |
0.1904 |
17.39 |
2000 |
0.3650 |
0.3545 |
0.1695 |
19.13 |
2200 |
0.3366 |
0.3241 |
0.1532 |
20.87 |
2400 |
0.3550 |
0.3311 |
0.1453 |
22.61 |
2600 |
0.3582 |
0.3131 |
0.1359 |
24.35 |
2800 |
0.3524 |
0.3084 |
0.1233 |
26.09 |
3000 |
0.3503 |
0.2973 |
0.1114 |
27.83 |
3200 |
0.3434 |
0.2946 |
0.1051 |
29.57 |
3400 |
0.3474 |
0.2956 |
0.0965 |
31.3 |
3600 |
0.3426 |
0.2907 |
0.0923 |
33.04 |
3800 |
0.3478 |
0.2894 |
0.0894 |
34.78 |
4000 |
0.3421 |
0.2860 |
Framework Versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.0
đ§ Technical Details
This model is a fine - tuned version of wav2vec2 - large - xls - r - 300m
on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BG
dataset. Through a series of training hyperparameter settings and optimization strategies, it has achieved good performance on the Bulgarian automatic speech recognition task.
đ License
This model is released under the Apache - 2.0 license.