20220415 210530
2
20220415 210530
Developed by lilitket
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-2b on the common_voice dataset
Downloads 20
Release Time : 4/15/2022
Model Overview
This is a fine-tuned model for speech recognition tasks, based on the wav2vec2-xls-r-2b architecture and trained on the common_voice dataset
Model Features
Large-scale Pre-trained Model Fine-tuning
Fine-tuned from the 2-billion-parameter wav2vec2-xls-r-2b model
Relatively Low Word Error Rate
Achieves a word error rate of 0.3881 on the evaluation set
Efficient Training
Optimized training process using techniques like gradient accumulation
Model Capabilities
Speech-to-Text
Automatic Speech Recognition
Use Cases
Speech Transcription
Speech-to-Text Service
Convert speech content into text transcripts
Word error rate 0.3881
Assistive Technology
Real-time Caption Generation
Generate real-time captions for video or live streaming content
đ 20220415-210530
This model is a fine - tuned version of facebook/wav2vec2-xls-r-2b on the common_voice dataset. It addresses speech - related tasks by leveraging the pre - trained capabilities of the base model and further adapting them to the specific characteristics of the common_voice dataset. This fine - tuning process enables the model to achieve better performance on speech recognition and related tasks.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-2b on the common_voice dataset. It achieves the following results on the evaluation set:
- Loss: 0.6544
- Wer: 0.3881
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 400
- num_epochs: 1200
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
6.1495 | 2.27 | 200 | 2.4098 | 1.0 |
0.4347 | 4.54 | 400 | 1.4211 | 0.9914 |
0.2295 | 6.82 | 600 | 1.0229 | 0.9349 |
0.1349 | 9.09 | 800 | 1.0063 | 0.9228 |
0.1001 | 11.36 | 1000 | 1.0333 | 0.9197 |
0.0847 | 13.63 | 1200 | 0.9021 | 0.8725 |
0.0697 | 15.91 | 1400 | 0.9117 | 0.8779 |
0.0634 | 18.18 | 1600 | 0.9550 | 0.8725 |
0.0607 | 20.45 | 1800 | 0.9063 | 0.8303 |
0.0551 | 22.73 | 2000 | 0.8163 | 0.7956 |
0.0536 | 25.0 | 2200 | 0.7385 | 0.7235 |
0.0511 | 27.27 | 2400 | 0.7917 | 0.7215 |
0.0449 | 29.54 | 2600 | 0.7508 | 0.6938 |
0.0417 | 31.82 | 2800 | 0.6892 | 0.6775 |
0.0415 | 34.09 | 3000 | 0.7029 | 0.6790 |
0.0384 | 36.36 | 3200 | 0.6839 | 0.6895 |
0.0392 | 38.63 | 3400 | 0.7067 | 0.6872 |
0.0358 | 40.91 | 3600 | 0.7310 | 0.6763 |
0.0337 | 43.18 | 3800 | 0.7139 | 0.6548 |
0.0362 | 45.45 | 4000 | 0.6975 | 0.6427 |
0.0311 | 47.73 | 4200 | 0.7054 | 0.6412 |
0.0327 | 50.0 | 4400 | 0.6530 | 0.6151 |
0.0286 | 52.27 | 4600 | 0.6565 | 0.6076 |
0.0304 | 54.54 | 4800 | 0.6931 | 0.6283 |
0.0285 | 56.82 | 5000 | 0.6966 | 0.6108 |
0.0279 | 59.09 | 5200 | 0.6473 | 0.5854 |
0.0276 | 61.36 | 5400 | 0.6497 | 0.5920 |
0.0238 | 63.63 | 5600 | 0.6283 | 0.5846 |
0.0237 | 65.91 | 5800 | 0.6871 | 0.5885 |
0.0221 | 68.18 | 6000 | 0.6518 | 0.5593 |
0.0221 | 70.45 | 6200 | 0.6676 | 0.5601 |
0.0215 | 72.73 | 6400 | 0.6299 | 0.5550 |
0.022 | 75.0 | 6600 | 0.6719 | 0.5636 |
0.0198 | 77.27 | 6800 | 0.6082 | 0.5569 |
0.0222 | 79.54 | 7000 | 0.6156 | 0.5589 |
0.0172 | 81.82 | 7200 | 0.6414 | 0.5636 |
0.0188 | 84.09 | 7400 | 0.5874 | 0.5347 |
0.0202 | 86.36 | 7600 | 0.6320 | 0.5421 |
0.0165 | 88.63 | 7800 | 0.6345 | 0.5304 |
0.0164 | 90.91 | 8000 | 0.6243 | 0.5289 |
0.0167 | 93.18 | 8200 | 0.6237 | 0.5285 |
0.015 | 95.45 | 8400 | 0.5937 | 0.5203 |
0.0169 | 97.73 | 8600 | 0.6171 | 0.5343 |
0.0147 | 100.0 | 8800 | 0.6857 | 0.5476 |
0.0164 | 102.27 | 9000 | 0.6099 | 0.5160 |
0.0152 | 104.54 | 9200 | 0.6319 | 0.5285 |
0.0149 | 106.82 | 9400 | 0.6133 | 0.5296 |
0.0155 | 109.09 | 9600 | 0.6237 | 0.5285 |
0.0149 | 111.36 | 9800 | 0.6127 | 0.5012 |
0.0142 | 113.63 | 10000 | 0.6119 | 0.4836 |
0.013 | 115.91 | 10200 | 0.5974 | 0.4746 |
0.012 | 118.18 | 10400 | 0.6296 | 0.5016 |
0.0137 | 120.45 | 10600 | 0.5990 | 0.5023 |
0.0146 | 122.73 | 10800 | 0.5784 | 0.4875 |
0.0117 | 125.0 | 11000 | 0.5436 | 0.4766 |
0.0133 | 127.27 | 11200 | 0.5890 | 0.5020 |
0.0133 | 129.54 | 11400 | 0.6028 | 0.4895 |
0.0119 | 131.82 | 11600 | 0.5483 | 0.4840 |
0.0133 | 134.09 | 11800 | 0.5638 | 0.4934 |
0.0108 | 136.36 | 12000 | 0.5750 | 0.4758 |
0.0098 | 138.63 | 12200 | 0.5978 | 0.4891 |
0.012 | 140.91 | 12400 | 0.5524 | 0.4805 |
0.01 | 143.18 | 12600 | 0.5731 | 0.4895 |
0.0125 | 145.45 | 12800 | 0.5583 | 0.4579 |
0.0102 | 147.73 | 13000 | 0.5806 | 0.5035 |
0.01 | 150.0 | 13200 | 0.5721 | 0.4711 |
0.0113 | 152.27 | 13400 | 0.5351 | 0.4602 |
0.011 | 154.54 | 13600 | 0.5472 | 0.4551 |
0.0078 | 156.82 | 13800 | 0.6011 | 0.4610 |
0.0105 | 159.09 | 14000 | 0.5702 | 0.4672 |
0.0081 | 161.36 | 14200 | 0.5643 | 0.4454 |
0.0088 | 163.63 | 14400 | 0.5084 | 0.4536 |
0.0094 | 165.91 | 14600 | 0.5320 | 0.4680 |
0.0083 | 168.18 | 14800 | 0.5175 | 0.4423 |
0.0095 | 170.45 | 15000 | 0.5213 | 0.4583 |
0.0097 | 172.73 | 15200 | 0.5242 | 0.4590 |
0.0092 | 175.0 | 15400 | 0.5680 | 0.4587 |
0.0081 | 177.27 | 15600 | 0.5668 | 0.4579 |
0.0075 | 179.54 | 15800 | 0.5602 | 0.4489 |
0.0094 | 181.82 | 16000 | 0.5540 | 0.4485 |
0.0083 | 184.09 | 16200 | 0.5367 | 0.4278 |
0.0084 | 186.36 | 16400 | 0.5376 | 0.4583 |
0.0093 | 188.63 | 16600 | 0.5599 | 0.4310 |
0.0085 | 190.91 | 16800 | 0.5356 | 0.4317 |
0.0066 | 193.18 | 17000 | 0.5517 | 0.4419 |
0.0074 | 195.45 | 17200 | 0.5401 | 0.4329 |
0.0094 | 197.73 | 17400 | 0.5067 | 0.4415 |
0.0078 | 200.0 | 17600 | 0.5410 | 0.4466 |
0.0085 | 202.27 | 17800 | 0.5157 | 0.4321 |
0.0081 | 204.54 | 18000 | 0.5390 | 0.4255 |
0.0068 | 206.82 | 18200 | 0.5566 | 0.4415 |
0.0069 | 209.09 | 18400 | 0.5693 | 0.4341 |
0.0089 | 211.36 | 18600 | 0.5588 | 0.4438 |
0.0086 | 213.63 | 18800 | 0.5656 | 0.4470 |
0.008 | 215.91 | 19000 | 0.5712 | 0.4438 |
0.0083 | 218.18 | 19200 | 0.5627 | 0.4423 |
0.0078 | 220.45 | 19400 | 0.5905 | 0.4298 |
0.0059 | 222.73 | 19600 | 0.5746 | 0.4228 |
0.0072 | 225.0 | 19800 | 0.5362 | 0.4275 |
0.006 | 227.27 | 20000 | 0.5909 | 0.4220 |
0.0074 | 229.54 | 20200 | 0.5863 | 0.4224 |
0.0079 | 231.82 | 20400 | 0.5366 | 0.4306 |
0.0066 | 234.09 | 20600 | 0.5128 | 0.4302 |
0.0068 | 236.36 | 20800 | 0.5436 | 0.4228 |
0.0073 | 238.63 | 21000 | 0.5731 | 0.4325 |
0.0081 | 240.91 | 21200 | 0.5189 | 0.4177 |
0.0061 | 243.18 | 21400 | 0.5593 | 0.4236 |
0.0061 | 245.45 | 21600 | 0.5553 | 0.4267 |
0.0044 | 247.73 | 21800 | 0.5763 | 0.4286 |
0.0064 | 250.0 | 22000 | 0.5360 | 0.4321 |
0.006 | 252.27 | 22200 | 0.5577 | 0.4372 |
0.0052 | 254.54 | 22400 | 0.5387 | 0.4122 |
0.0054 | 256.82 | 22600 | 0.5117 | 0.4239 |
0.0057 | 259.09 | 22800 | 0.5498 | 0.4232 |
0.0069 | 261.36 | 23000 | 0.5263 | 0.4353 |
0.005 | 263.63 | 23200 | 0.5147 | 0.4177 |
0.0058 | 265.91 | 23400 | 0.5273 | 0.4173 |
0.006 | 268.18 | 23600 | 0.5879 | 0.4380 |
0.0059 | 270.45 | 23800 | 0.5377 | 0.4349 |
0.0055 | 272.73 | 24000 | 0.6061 | 0.4364 |
0.0058 | 275.0 | 24200 | 0.5977 | 0.4353 |
0.0051 | 277.27 | 24400 | 0.5847 | 0.4208 |
0.0046 | 279.54 | 24600 | 0.5728 | 0.4333 |
0.006 | 281.82 | 24800 | 0.5392 | 0.4204 |
0.0074 | 284.09 | 25000 | 0.5618 | 0.4232 |
0.0058 | 286.36 | 25200 | 0.5449 | 0.4197 |
0.0057 | 288.63 | 25400 | 0.5635 | 0.4169 |
0.0054 | 290.91 | 25600 | 0.5313 | 0.4173 |
0.0044 | 293.18 | 25800 | 0.5544 | 0.4306 |
0.0039 | 295.45 | 26000 | 0.5392 | 0.4247 |
0.0054 | 297.73 | 26200 | 0.5395 | 0.4271 |
0.0044 | 300.0 | 26400 | 0.5489 | 0.4228 |
0.0042 | 302.27 | 26600 | 0.5414 | 0.4173 |
0.0051 | 304.54 | 26800 | 0.5198 | 0.4193 |
0.005 | 306.82 | 27000 | 0.5297 | 0.4146 |
0.0051 | 309.09 | 27200 | 0.5414 | 0.4212 |
0.0057 | 311.36 | 27400 | 0.5204 | 0.4228 |
0.0049 | 313.63 | 27600 | 0.5806 | 0.4239 |
0.0036 | 315.91 | 27800 | 0.5771 | 0.4173 |
0.0045 | 318.18 | 28000 | 0.5517 | 0.4239 |
0.0051 | 320.45 | 28200 | 0.5498 | 0.4173 |
0.0043 | 322.73 | 28400 | 0.5791 | 0.4181 |
0.0044 | 325.0 | 28600 | 0.6030 | 0.4200 |
0.0067 | 327.27 | 28800 | 0.5799 | 0.4208 |
0.0041 | 329.54 | 29000 | 0.5871 | 0.4134 |
0.0048 | 331.82 | 29200 | 0.5471 | 0.4158 |
0.0031 | 334.09 | 29400 | 0.5977 | 0.4220 |
0.0042 | 336.36 | 29600 | 0.5813 | 0.4181 |
0.0045 | 338.63 | 29800 | 0.6167 | 0.4306 |
0.0044 | 340.91 | 30000 | 0.5661 | 0.4173 |
0.0029 | 343.18 | 30200 | 0.5680 | 0.4158 |
0.0037 | 345.45 | 30400 | 0.5747 | 0.4204 |
0.005 | 347.73 | 30600 | 0.5883 | 0.4349 |
0.0037 | 350.0 | 30800 | 0.6187 | 0.4189 |
0.0044 | 352.27 | 31000 | 0.5834 | 0.4431 |
0.0047 | 354.54 | 31200 | 0.5567 | 0.4247 |
0.0039 | 356.82 | 31400 | 0.5900 | 0.4314 |
0.0044 | 359.09 | 31600 | 0.5879 | 0.4216 |
0.0042 | 361.36 | 31800 | 0.5639 | 0.4220 |
0.0046 | 363.63 | 32000 | 0.5292 | 0.4185 |
0.0043 | 365.91 | 32200 | 0.5640 | 0.4353 |
0.0033 | 368.18 | 32400 | 0.5468 | 0.4208 |
0.002 | 370.45 | 32600 | 0.5836 | 0.4220 |
0.0043 | 372.73 | 32800 | 0.5692 | 0.4142 |
0.0038 | 375.0 | 33000 | 0.5739 | 0.4177 |
0.0039 | 377.27 | 33200 | 0.5824 | 0.4103 |
0.0028 | 379.54 | 33400 | 0.6069 | 0.4111 |
0.0038 | 381.82 | 33600 | 0.5868 | 0.4185 |
0.0041 | 384.09 | 33800 | 0.5169 | 0.4126 |
0.0037 | 386.36 | 34000 | 0.5395 | 0.4275 |
0.0063 | 388.63 | 34200 | 0.5293 | 0.4294 |
0.0042 | 390.91 | 34400 | 0.5472 | 0.4165 |
0.0039 | 393.18 | 34600 | 0.5391 | 0.4091 |
0.0036 | 395.45 | 34800 | 0.5360 | 0.4239 |
0.0036 | 397.73 | 35000 | 0.5511 | 0.4177 |
0.0019 | 400.0 | 35200 | 0.5775 | 0.4115 |
0.0038 | 402.27 | 35400 | 0.5376 | 0.4087 |
0.0035 | 404.54 | 35600 | 0.5755 | 0.4130 |
0.0042 | 406.82 | 35800 | 0.5443 | 0.4087 |
0.0036 | 409.09 | 36000 | 0.6091 | 0.4200 |
0.004 | 411.36 | 36200 | 0.5817 | 0.4247 |
0.0039 | 413.63 | 36400 | 0.5779 | 0.4255 |
0.003 | 415.91 | 36600 |
đ License
This model is licensed under the Apache 2.0 license.
Voice Activity Detection
MIT
Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio
Speech Recognition
V
pyannote
7.7M
181
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.
Speech Recognition Other
W
jonatasgrosman
4.9M
32
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition
Transformers Supports Multiple Languages

W
openai
4.0M
2,317
Wav2vec2 Large Xlsr 53 Russian
Apache-2.0
A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Other
W
jonatasgrosman
3.9M
54
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Wav2vec2 Large Xlsr 53 Dutch
Apache-2.0
A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.
Speech Recognition Other
W
jonatasgrosman
3.0M
12
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Mms 300m 1130 Forced Aligner
A text-to-audio forced alignment tool based on Hugging Face pre-trained models, supporting multiple languages with high memory efficiency
Speech Recognition
Transformers Supports Multiple Languages

M
MahmoudAshraf
2.5M
50
Wav2vec2 Large Xlsr 53 Arabic
Apache-2.0
Arabic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on Common Voice and Arabic speech corpus
Speech Recognition Arabic
W
jonatasgrosman
2.3M
37
Featured Recommended AI Models
Š 2025AIbase