đ sammy786/wav2vec2-xlsr-basaa
This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - bas dataset. It's designed for automatic speech recognition, offering a solution for converting speech to text with specific performance metrics.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - bas dataset.
It achieves the following results on the evaluation set (which is 10 percent of the train data set merged with other and dev datasets):
⨠Features
- Fine - Tuned Model: Based on "facebook/wav2vec2-xls-r-1b", fine - tuned on the specific bas dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0.
- Performance Metrics: Achieves certain loss and WER values on the evaluation set, indicating its effectiveness in speech recognition.
đ Documentation
Model description
"facebook/wav2vec2-xls-r-1b" was finetuned.
Intended uses & limitations
More information needed
Training and evaluation data
Training data -
Common voice Finnish train.tsv, dev.tsv and other.tsv
Training procedure
For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.000045637994662983496
- train_batch_size: 16
- eval_batch_size: 16
- seed: 13
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 500
- num_epochs: 70
- mixed_precision_training: Native AMP
Training results
Step |
Training Loss |
Validation Loss |
Wer |
200 |
6.734100 |
1.605006 |
0.980456 |
400 |
1.011200 |
0.364686 |
0.442997 |
600 |
0.709300 |
0.300204 |
0.377850 |
800 |
0.469800 |
0.315612 |
0.405537 |
1000 |
0.464700 |
0.352494 |
0.372964 |
1200 |
0.421900 |
0.342533 |
0.368078 |
1400 |
0.401900 |
0.351398 |
0.343648 |
1600 |
0.429800 |
0.350570 |
0.348534 |
1800 |
0.352600 |
0.356601 |
0.358306 |
2000 |
0.387200 |
0.355814 |
0.356678 |
2200 |
0.362400 |
0.345573 |
0.355049 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.0+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.10.3
Evaluation Commands
- To evaluate on
mozilla - foundation/common_voice_8_0
with split test
python eval.py --model_id sammy786/wav2vec2-xlsr-basaa --dataset mozilla - foundation/common_voice_8_0 --config bas --split test
đ License
This project is under the Apache - 2.0 license.
Property |
Details |
Model Type |
Fine - tuned version of "facebook/wav2vec2-xls-r-1b" on MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - bas dataset |
Training Data |
Common voice Finnish train.tsv, dev.tsv and other.tsv |