đ Cdial/Hausa_xlsr
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m. It is designed for automatic speech recognition tasks, achieving high performance on Hausa language datasets.
đ Quick Start
Evaluation
- To evaluate on
mozilla-foundation/common_voice_8_0
with split test
python eval.py --model_id Akashpb13/Hausa_xlsr --dataset mozilla-foundation/common_voice_8_0 --config ha --split test
⨠Features
- Fine - Tuned Model: Based on facebook/wav2vec2-xls-r-300m, fine - tuned for better performance on Hausa language tasks.
- High Performance: Achieves good results on multiple evaluation metrics such as WER and CER on relevant datasets.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Name |
Cdial/Hausa_xlsr |
Base Model |
facebook/wav2vec2-xls-r-300m |
Language |
Hausa (ha) |
Task |
Automatic Speech Recognition |
License |
Apache - 2.0 |
Tags |
automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, ha, robust - speech - event, model_for_talk, hf - asr - leaderboard |
Datasets |
mozilla - foundation/common_voice_8_0 |
Evaluation Results
The model achieves the following results on different evaluation sets:
- Common Voice 8 (mozilla - foundation/common_voice_8_0, ha):
- Test WER: 0.20614541257934219
- Test CER: 0.04358048053214061
- Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, ha):
- Test WER: 0.20614541257934219
- Test CER: 0.04358048053214061
On the evaluation set (which is 10 percent of train data set merged with invalidated data, reported, other, and dev datasets):
- Loss: 0.275118
- Wer: 0.329955
Model Description
"facebook/wav2vec2-xls-r-300m" was finetuned.
Intended Uses & Limitations
More information needed
Training and Evaluation Data
- Training Data: Common voice Hausa train.tsv, dev.tsv, invalidated.tsv, reported.tsv and other.tsv. Only those points were considered where upvotes were greater than downvotes and duplicates were removed after concatenation of all the datasets given in common voice 7.0.
Training Procedure
- Dataset Creation: All possible datasets were appended and a 90 - 10 split was used.
- Training Hyperparameters:
- learning_rate: 0.000096
- train_batch_size: 16
- eval_batch_size: 16
- seed: 13
- gradient_accumulation_steps: 2
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 500
- num_epochs: 50
- mixed_precision_training: Native AMP
- Training Results:
| Step | Training Loss | Validation Loss | Wer |
|------|---------------|-----------------|----------|
| 500 | 5.175900 | 2.750914 | 1.000000 |
| 1000 | 1.028700 | 0.338649 | 0.497999 |
| 1500 | 0.332200 | 0.246896 | 0.402241 |
| 2000 | 0.227300 | 0.239640 | 0.395839 |
| 2500 | 0.175000 | 0.239577 | 0.373966 |
| 3000 | 0.140400 | 0.243272 | 0.356095 |
| 3500 | 0.119200 | 0.263761 | 0.365164 |
| 4000 | 0.099300 | 0.265954 | 0.353428 |
| 4500 | 0.084400 | 0.276367 | 0.349693 |
| 5000 | 0.073700 | 0.282631 | 0.343825 |
| 5500 | 0.068000 | 0.282344 | 0.341158 |
| 6000 | 0.064500 | 0.281591 | 0.342491 |
Framework Versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.0+cu102
- Datasets 1.18.3
- Tokenizers 0.10.3
đ§ Technical Details
The model is a fine - tuned version of facebook/wav2vec2-xls-r-300m. The fine - tuning process involves adjusting the model's parameters on Hausa language datasets to improve its performance on automatic speech recognition tasks.
đ License
This model is released under the Apache - 2.0 license.