🚀 wav2vec2-large-xls-r-300m-sat-a3
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SAT dataset. It is mainly used for automatic speech recognition, aiming to provide high - quality speech - to - text conversion and perform well in related tasks.
✨ Features
- Fine - tuned Model: Based on the pre - trained model [facebook/wav2vec2-xls-r-300m], it is fine - tuned on the SAT dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0 to adapt to specific speech recognition tasks.
- Multi - metric Evaluation: Evaluated using multiple metrics such as WER (Word Error Rate) and CER (Character Error Rate) to comprehensively measure the performance of the model.
📚 Documentation
Model Information
Property |
Details |
Model Type |
wav2vec2-large-xls-r-300m-sat-a3 |
Training Data |
mozilla - foundation/common_voice_8_0 |
License |
apache - 2.0 |
Tags |
automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, sat, robust - speech - event, model_for_talk, hf - asr - leaderboard |
Evaluation Results
The model achieves the following results on the evaluation set:
The detailed evaluation results on different datasets are as follows:
Task |
Dataset |
Test WER |
Test CER |
Automatic Speech Recognition |
Common Voice 8 (mozilla - foundation/common_voice_8_0, args: sat) |
0.357429718875502 |
0.14203730272596843 |
Automatic Speech Recognition |
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: sat) |
NA |
NA |
Evaluation Commands
- Evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sat-a3 --dataset mozilla-foundation/common_voice_8_0 --config sat --split test --log_outputs
- Evaluate on speech - recognition - community - v2/dev_data
Note: Santali (Ol Chiki) language not found in speech - recognition - community - v2/dev_data
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0004
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 200
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
11.1266 |
33.29 |
100 |
2.8577 |
1.0 |
2.1549 |
66.57 |
200 |
1.0799 |
0.5542 |
0.5628 |
99.86 |
300 |
0.7973 |
0.4016 |
0.0779 |
133.29 |
400 |
0.8424 |
0.4177 |
0.0404 |
166.57 |
500 |
0.9048 |
0.4137 |
0.0212 |
199.86 |
600 |
0.8961 |
0.3976 |
Framework Versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.0