đ wav2vec2-large-xls-r-300m-latvian
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m for automatic speech recognition in Latvian, offering high - quality speech recognition capabilities.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - LV dataset. It achieves the following results on the evaluation set:
⨠Features
- Multilingual Adaptability: Based on the pre - trained model
facebook/wav2vec2-xls-r-300m
, it can adapt to different language environments.
- High - precision Recognition: Achieved low WER and CER on multiple datasets, demonstrating high - precision speech recognition capabilities.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đ Documentation
Model Information
Property |
Details |
Model Type |
Fine - tuned version of facebook/wav2vec2-xls-r-300m |
Training Data |
mozilla - foundation/common_voice_7_0 |
Tags |
automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, lv, model_for_talk, mozilla - foundation/common_voice_7_0, robust - speech - event |
License |
apache - 2.0 |
Model Performance
The model has been evaluated on multiple datasets, and the results are as follows:
- Common Voice 7:
- Test WER: 16.977
- Test CER: 4.23
- Robust Speech Event - Dev Data:
- Test WER: 45.247
- Test CER: 16.924
- Robust Speech Event - Test Data:
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e - 05
- train_batch_size: 32
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 100.0
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
1.4235 |
12.82 |
2000 |
0.4475 |
0.4551 |
0.9383 |
25.64 |
4000 |
0.2235 |
0.2328 |
0.8359 |
38.46 |
6000 |
0.2004 |
0.2098 |
0.7633 |
51.28 |
8000 |
0.1960 |
0.1882 |
0.7001 |
64.1 |
10000 |
0.1902 |
0.1809 |
0.652 |
76.92 |
12000 |
0.1979 |
0.1775 |
0.6025 |
89.74 |
14000 |
0.1866 |
0.1696 |
Framework Versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0
đ§ Technical Details
The model is based on the pre - trained model facebook/wav2vec2-xls-r-300m
and fine - tuned on the mozilla - foundation/common_voice_7_0
dataset. During the training process, a series of hyperparameters were used to adjust the model, such as learning rate, batch size, etc. Through multiple epochs of training, the model gradually converges and achieves good performance on the evaluation set.
đ License
This model is licensed under the apache - 2.0 license.