đ sammy786/wav2vec2-xlsr-georgian
This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - ka dataset. It's designed for automatic speech recognition tasks and has achieved certain results on evaluation sets.
⨠Features
- This model is fine - tuned for the Georgian language on the Common Voice 8.0 dataset.
- It can be used for automatic speech recognition tasks, with specific performance metrics on different datasets.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ Documentation
Model description
"facebook/wav2vec2-xls-r-1b" was finetuned.
Intended uses & limitations
More information needed
Training and evaluation data
Training data - Common voice Finnish train.tsv, dev.tsv and other.tsv
Training procedure
For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.000045637994662983496
- train_batch_size: 8
- eval_batch_size: 16
- seed: 13
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 500
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Step |
Training Loss |
Validation Loss |
Wer |
200 |
4.152100 |
0.823672 |
0.967814 |
400 |
0.889500 |
0.196740 |
0.444792 |
600 |
0.493700 |
0.155659 |
0.366115 |
800 |
0.328000 |
0.138066 |
0.358069 |
1000 |
0.260600 |
0.119236 |
0.324989 |
1200 |
0.217200 |
0.114050 |
0.313366 |
1400 |
0.188800 |
0.112600 |
0.302190 |
1600 |
0.166900 |
0.111154 |
0.295485 |
1800 |
0.155500 |
0.109963 |
0.286544 |
2000 |
0.140400 |
0.107587 |
0.277604 |
2200 |
0.142600 |
0.105662 |
0.277157 |
2400 |
0.135400 |
0.105414 |
0.275369 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.0+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.10.3
Evaluation Commands
- To evaluate on
mozilla - foundation/common_voice_8_0
with split test
python eval.py --model_id sammy786/wav2vec2-xlsr-georgian --dataset mozilla-foundation/common_voice_8_0 --config ka --split test
đ§ Technical Details
The model is fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - ka dataset. During training, all possible datasets were combined, and a 90 - 10 split was used to create the training and evaluation sets. Specific hyperparameters were set for the training process, and the model's performance was evaluated on different datasets, with metrics such as WER and CER reported.
đ License
The model is released under the Apache 2.0 license.
đ Model Index
Property |
Details |
Model Name |
sammy786/wav2vec2-xlsr-czech |
Task |
Automatic Speech Recognition |
Datasets |
- Common Voice 8 (mozilla - foundation/common_voice_8_0 - ka) - Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data - ka) - Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data - ka) |
Metrics |
- On Common Voice 8: Test WER = 23.9, Test CER = 3.59 - On Robust Speech Event - Dev Data: Test WER = 75.07 - On Robust Speech Event - Test Data: Test WER = 74.41 |