đ wav2vec2-large-xls-r-300m-kk-with-LM
This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for Automatic Speech Recognition in the Kazakh (kk) language. It offers high - quality speech recognition performance on relevant datasets.
đ Quick Start
Evaluation Commands
- Evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-kk-with-LM --dataset mozilla-foundation/common_voice_8_0 --config kk --split test --log_outputs
- Evaluate on speech - recognition - community - v2/dev_data
Kazakh language isn't available in speech - recognition - community - v2/dev_data
⨠Features
- Multilingual Adaptability: Built on a large - scale pre - trained model, it can adapt well to the Kazakh language.
- High - Performance Metrics: Achieves good results in WER (Word Error Rate) and CER (Character Error Rate) on the evaluation datasets.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Type |
wav2vec2 - large - xls - r - 300m - kk - with - LM |
Training Data |
mozilla - foundation/common_voice_8_0 |
Evaluation Results
This model has the following performance on different datasets:
- Common Voice 8 (ru):
- Test WER: 0.4355
- Test CER: 0.10469915859660263
- Test WER (+LM): 0.417
- Test CER (+LM): 0.10319098269566598
- Robust Speech Event - Dev Data (kk):
- Test WER: NA
- Test CER: NA
- Common Voice 8.0 (kk):
- Robust Speech Event - Test Data (kk):
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.000222
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 150.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
9.6799 |
9.09 |
200 |
3.6119 |
1.0 |
3.1332 |
18.18 |
400 |
2.5352 |
1.005 |
1.0465 |
27.27 |
600 |
0.6169 |
0.682 |
0.3452 |
36.36 |
800 |
0.6572 |
0.607 |
0.2575 |
45.44 |
1000 |
0.6527 |
0.578 |
0.2088 |
54.53 |
1200 |
0.6828 |
0.551 |
0.158 |
63.62 |
1400 |
0.7074 |
0.5575 |
0.1309 |
72.71 |
1600 |
0.6523 |
0.5595 |
0.1074 |
81.8 |
1800 |
0.7262 |
0.5415 |
0.087 |
90.89 |
2000 |
0.7199 |
0.521 |
0.0711 |
99.98 |
2200 |
0.7113 |
0.523 |
0.0601 |
109.09 |
2400 |
0.6863 |
0.496 |
0.0451 |
118.18 |
2600 |
0.6998 |
0.483 |
0.0378 |
127.27 |
2800 |
0.6971 |
0.4615 |
0.0319 |
136.36 |
3000 |
0.7119 |
0.4475 |
0.0305 |
145.44 |
3200 |
0.7181 |
0.459 |
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
đ§ Technical Details
No in - depth technical details are provided in the original document, so this section is skipped.
đ License
This model is released under the Apache - 2.0 license.