đ wav2vec2-xls-r-300m-kk-n2
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It is designed for automatic speech recognition tasks, providing a reliable solution for transcribing Kazakh speech.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It achieves the following results on the evaluation set:
⨠Features
- Fine - Tuned: Based on the pre - trained facebook/wav2vec2-xls-r-300m, fine - tuned on the Kazakh dataset for better performance.
- Multiple Evaluation Metrics: Evaluated using Loss, WER (Word Error Rate), and CER (Character Error Rate).
đĻ Installation
No specific installation steps are provided in the original README.
đģ Usage Examples
Evaluation Commands
Evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-300m-kk-n2 --dataset mozilla-foundation/common_voice_8_0 --config kk --split test --log_outputs
Evaluate on speech - recognition - community - v2/dev_data
Kazakh language not found in speech-recognition-community-v2/dev_data!
đ Documentation
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.000222
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 150.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
9.6799 |
9.09 |
200 |
3.6119 |
1.0 |
3.1332 |
18.18 |
400 |
2.5352 |
1.005 |
1.0465 |
27.27 |
600 |
0.6169 |
0.682 |
0.3452 |
36.36 |
800 |
0.6572 |
0.607 |
0.2575 |
45.44 |
1000 |
0.6527 |
0.578 |
0.2088 |
54.53 |
1200 |
0.6828 |
0.551 |
0.158 |
63.62 |
1400 |
0.7074 |
0.5575 |
0.1309 |
72.71 |
1600 |
0.6523 |
0.5595 |
0.1074 |
81.8 |
1800 |
0.7262 |
0.5415 |
0.087 |
90.89 |
2000 |
0.7199 |
0.521 |
0.0711 |
99.98 |
2200 |
0.7113 |
0.523 |
0.0601 |
109.09 |
2400 |
0.6863 |
0.496 |
0.0451 |
118.18 |
2600 |
0.6998 |
0.483 |
0.0378 |
127.27 |
2800 |
0.6971 |
0.4615 |
0.0319 |
136.36 |
3000 |
0.7119 |
0.4475 |
0.0305 |
145.44 |
3200 |
0.7181 |
0.459 |
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
đ§ Technical Details
The model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It uses specific hyperparameters during training to optimize its performance on Kazakh speech recognition tasks.
đ License
The model is released under the Apache - 2.0 license.
Property |
Details |
Model Type |
Fine - tuned wav2vec2 - xls - r - 300m for Kazakh speech recognition |
Training Data |
MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset |