đ wav2vec2-large-xls-r-300m-as-v9
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It is designed for automatic speech recognition, offering high - quality performance in processing speech data.
⨠Features
- Multilingual Adaptability: Trained on datasets like mozilla - foundation/common_voice_8_0, suitable for multiple language scenarios.
- High - performance Metrics: Achieved good results in evaluation metrics such as WER and CER.
đĻ Installation
No installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Evaluation Command
- To evaluate on mozilla - foundation/common_voice_8_0 with test split
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-as-v9 --dataset mozilla-foundation/common_voice_8_0 --config as --split test --log_outputs
- To evaluate on speech - recognition - community - v2/dev_data
Assamese (as) language isn't available in speech - recognition - community - v2/dev_data
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
0.000111 |
train_batch_size |
16 |
eval_batch_size |
8 |
seed |
42 |
gradient_accumulation_steps |
2 |
total_train_batch_size |
32 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon=1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
300 |
num_epochs |
200 |
mixed_precision_training |
Native AMP |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
8.3852 |
10.51 |
200 |
3.6402 |
1.0 |
3.5374 |
21.05 |
400 |
3.3894 |
1.0 |
2.8645 |
31.56 |
600 |
1.3143 |
0.8303 |
1.1784 |
42.1 |
800 |
0.9417 |
0.6661 |
0.7805 |
52.62 |
1000 |
0.9292 |
0.6237 |
0.5973 |
63.15 |
1200 |
0.9489 |
0.6014 |
0.4784 |
73.67 |
1400 |
0.9916 |
0.5962 |
0.4138 |
84.21 |
1600 |
1.0272 |
0.6121 |
0.3491 |
94.72 |
1800 |
1.0412 |
0.5984 |
0.3062 |
105.26 |
2000 |
1.0769 |
0.6005 |
0.2707 |
115.77 |
2200 |
1.0708 |
0.5752 |
0.2459 |
126.31 |
2400 |
1.1285 |
0.6009 |
0.2234 |
136.82 |
2600 |
1.1209 |
0.5949 |
0.2035 |
147.36 |
2800 |
1.1348 |
0.5842 |
0.1876 |
157.87 |
3000 |
1.1480 |
0.5872 |
0.1669 |
168.41 |
3200 |
1.1496 |
0.5838 |
0.1595 |
178.92 |
3400 |
1.1721 |
0.5778 |
0.1505 |
189.46 |
3600 |
1.1654 |
0.5744 |
0.1486 |
199.97 |
3800 |
1.1679 |
0.5761 |
Framework versions
Property |
Details |
Transformers |
4.16.1 |
Pytorch |
1.10.0+cu111 |
Datasets |
1.18.2 |
Tokenizers |
0.11.0 |
đ§ Technical Details
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. During the training process, specific hyperparameters were used, and it achieved certain results in the evaluation set, including loss, WER, etc.
đ License
This project is licensed under the Apache - 2.0 license.