đ wav2vec2-large-xls-r-300m-sr-v4
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SR dataset. It is designed for automatic speech recognition, providing high - quality speech - to - text conversion.
⨠Features
- Fine - tuned Model: Based on the pre - trained
facebook/wav2vec2-xls-r-300m
model, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SR dataset.
- Multiple Evaluation Metrics: Evaluated on multiple datasets with metrics like Loss, WER (Word Error Rate), and CER (Character Error Rate).
- Detailed Training Information: Provides comprehensive training hyperparameters and training results.
đĻ Installation
No installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Evaluation Commands
- Evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sr-v4 --dataset mozilla-foundation/common_voice_8_0 --config sr --split test --log_outputs
- Evaluate on speech - recognition - community - v2/dev_data
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sr-v4 --dataset speech-recognition-community-v2/dev_data --config sr --split validation --chunk_length_s 10 --stride_length_s 1
Model Performance
It achieves the following results on the evaluation set:
Model Index
Task |
Dataset |
Metrics |
Value |
Automatic Speech Recognition |
Common Voice 8 (mozilla - foundation/common_voice_8_0 - sr) |
Test WER |
0.303313 |
Automatic Speech Recognition |
Common Voice 8 (mozilla - foundation/common_voice_8_0 - sr) |
Test CER |
0.1048951 |
Automatic Speech Recognition |
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data - sr) |
Test WER |
0.9486784706184245 |
Automatic Speech Recognition |
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data - sr) |
Test CER |
0.8084369606584945 |
Automatic Speech Recognition |
Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data - sr) |
Test WER |
94.53 |
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameter |
Value |
learning_rate |
0.0003 |
train_batch_size |
16 |
eval_batch_size |
8 |
seed |
42 |
gradient_accumulation_steps |
2 |
total_train_batch_size |
32 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
800 |
num_epochs |
200 |
mixed_precision_training |
Native AMP |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
8.2934 |
7.5 |
300 |
2.9777 |
0.9995 |
1.5049 |
15.0 |
600 |
0.5036 |
0.4806 |
0.3263 |
22.5 |
900 |
0.5822 |
0.4055 |
0.2008 |
30.0 |
1200 |
0.5609 |
0.4032 |
0.1543 |
37.5 |
1500 |
0.5203 |
0.3710 |
0.1158 |
45.0 |
1800 |
0.6458 |
0.3985 |
0.0997 |
52.5 |
2100 |
0.6227 |
0.4013 |
0.0834 |
60.0 |
2400 |
0.6048 |
0.3836 |
0.0665 |
67.5 |
2700 |
0.6197 |
0.3686 |
0.0602 |
75.0 |
3000 |
0.5418 |
0.3453 |
0.0524 |
82.5 |
3300 |
0.5310 |
0.3486 |
0.0445 |
90.0 |
3600 |
0.5599 |
0.3374 |
0.0406 |
97.5 |
3900 |
0.5958 |
0.3327 |
0.0358 |
105.0 |
4200 |
0.6017 |
0.3262 |
0.0302 |
112.5 |
4500 |
0.5613 |
0.3248 |
0.0285 |
120.0 |
4800 |
0.5659 |
0.3462 |
0.0213 |
127.5 |
5100 |
0.5568 |
0.3206 |
0.0215 |
135.0 |
5400 |
0.6524 |
0.3472 |
0.0162 |
142.5 |
5700 |
0.6223 |
0.3458 |
0.0137 |
150.0 |
6000 |
0.6625 |
0.3313 |
0.0114 |
157.5 |
6300 |
0.5739 |
0.3336 |
0.0101 |
165.0 |
6600 |
0.5906 |
0.3285 |
0.008 |
172.5 |
6900 |
0.5982 |
0.3112 |
0.0076 |
180.0 |
7200 |
0.5399 |
0.3094 |
0.0071 |
187.5 |
7500 |
0.5387 |
0.2991 |
0.0057 |
195.0 |
7800 |
0.5570 |
0.3038 |
Framework versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.2
- Tokenizers 0.11.0
đ§ Technical Details
The model is fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SR dataset. During training, a series of hyperparameters are carefully selected to optimize the model's performance. The learning rate is set to 0.0003, and the training batch size is 16. The optimizer used is Adam with specific betas and epsilon values, and the learning rate scheduler is of the linear type with 800 warm - up steps.
đ License
This model is released under the Apache 2.0 license.