đ wav2vec2-large-xls-r-300m-pa-IN-dx1
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - PA - IN dataset. It can be used for automatic speech recognition tasks, achieving certain performance on the evaluation set.
⨠Features
- Language Support: Specifically fine - tuned for the Punjabi (pa - IN) language.
- Multiple Datasets: Evaluated on multiple datasets including Common Voice 8 and Robust Speech Event - Dev Data.
- Performance Metrics: Achieved specific WER and CER values on the test set.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Name |
wav2vec2-large-xls-r-300m-pa-IN-dx1 |
Model Type |
Fine - tuned from facebook/wav2vec2-xls-r-300m |
Training Datasets |
mozilla - foundation/common_voice_8_0 |
Languages Supported |
pa - IN |
Evaluation Results
This model achieves the following results on different evaluation sets:
Task |
Dataset |
Test WER |
Test CER |
Automatic Speech Recognition |
Common Voice 8 (pa - IN) |
0.48725989807918463 |
0.1687305197540224 |
Automatic Speech Recognition |
Robust Speech Event - Dev Data (pa - IN) |
NA |
NA |
Evaluation Commands
- Evaluate on mozilla - foundation/common_voice_8_0 with test split:
python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-pa-IN-dx1 --dataset mozilla-foundation/common_voice_8_0 --config pa-IN --split test --log_outputs
- Evaluate on speech - recognition - community - v2/dev_data:
Punjabi language isn't available in speech - recognition - community - v2/dev_data
Training Hyperparameters
The following hyperparameters were used during training:
- Learning Rate: 0.0003
- Train Batch Size: 16
- Eval Batch Size: 8
- Seed: 42
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- LR Scheduler Type: linear
- LR Scheduler Warmup Steps: 1200
- Number of Epochs: 100.0
- Mixed Precision Training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
3.4607 |
9.26 |
500 |
2.7746 |
1.0416 |
0.3442 |
18.52 |
1000 |
0.9114 |
0.5911 |
0.2213 |
27.78 |
1500 |
0.9687 |
0.5751 |
0.1242 |
37.04 |
2000 |
1.0204 |
0.5461 |
0.0998 |
46.3 |
2500 |
1.0250 |
0.5233 |
0.0727 |
55.56 |
3000 |
1.1072 |
0.5382 |
0.0605 |
64.81 |
3500 |
1.0588 |
0.5073 |
0.0458 |
74.07 |
4000 |
1.0818 |
0.5069 |
0.0338 |
83.33 |
4500 |
1.0948 |
0.5108 |
0.0223 |
92.59 |
5000 |
1.0986 |
0.4775 |
Framework Versions
- Transformers: 4.17.0.dev0
- Pytorch: 1.10.2+cu102
- Datasets: 1.18.2.dev0
- Tokenizers: 0.11.0
đ§ Technical Details
The model is a fine - tuned version of the pre - trained model facebook/wav2vec2 - xls - r - 300m. It uses specific hyperparameters during training and is evaluated on multiple datasets to ensure its performance on the Punjabi language.
đ License
This model is released under the Apache 2.0 license.