đ whisper-large-v2-mn-13
This model is a fine - tuned version of openai/whisper-large-v2 on the None dataset. It offers high - performance automatic speech recognition capabilities, achieving excellent results on the evaluation set.
đ Quick Start
This model is a fine - tuned version of openai/whisper-large-v2 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1689
- Wer: 20.0240
- Cer: 6.6010
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 25000
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Cer |
Validation Loss |
Wer |
0.3921 |
0.09 |
1000 |
15.7845 |
0.4101 |
46.9030 |
0.3115 |
0.17 |
2000 |
14.2911 |
0.3353 |
41.8451 |
0.2659 |
0.26 |
3000 |
11.8131 |
0.2800 |
34.6406 |
0.2477 |
0.35 |
4000 |
10.6659 |
0.2578 |
32.0024 |
0.2274 |
0.43 |
5000 |
10.0460 |
0.2463 |
30.3419 |
0.2059 |
0.52 |
6000 |
9.9264 |
0.2305 |
28.5558 |
0.2092 |
0.61 |
7000 |
9.4277 |
0.2196 |
27.8785 |
0.1956 |
0.69 |
8000 |
9.2745 |
0.2093 |
26.8353 |
0.195 |
0.78 |
9000 |
8.9485 |
0.2042 |
26.6168 |
0.195 |
0.87 |
10000 |
8.5324 |
0.2001 |
25.6718 |
0.1795 |
0.95 |
11000 |
8.1786 |
0.1936 |
24.1698 |
0.1575 |
1.04 |
12000 |
7.8653 |
0.1915 |
23.8912 |
0.1358 |
1.13 |
13000 |
7.6749 |
0.1918 |
23.3778 |
0.1509 |
1.21 |
14000 |
7.7221 |
0.1852 |
23.1811 |
0.1474 |
1.3 |
15000 |
7.3246 |
0.1764 |
22.4984 |
0.1461 |
1.39 |
16000 |
7.3187 |
0.1793 |
22.4110 |
0.134 |
1.47 |
17000 |
7.1123 |
0.1737 |
21.9412 |
0.1289 |
1.56 |
18000 |
7.4593 |
0.1727 |
22.0614 |
0.1287 |
1.65 |
19000 |
7.0230 |
0.1701 |
21.4223 |
0.1196 |
1.73 |
20000 |
6.9447 |
0.1666 |
21.2475 |
0.1275 |
1.82 |
21000 |
6.7956 |
0.1653 |
20.8106 |
0.1329 |
1.91 |
22000 |
6.7729 |
0.1622 |
20.3354 |
0.1294 |
1.99 |
23000 |
6.6448 |
0.1606 |
20.2207 |
0.1043 |
2.08 |
24000 |
6.6010 |
0.1689 |
20.0240 |
0.079 |
2.17 |
25000 |
6.6246 |
0.1687 |
20.1005 |
Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.1+cu117
- Datasets 2.8.1.dev0
- Tokenizers 0.13.2
đ License
This model is licensed under the Apache - 2.0 license.
đ Additional Information
Property |
Details |
Tags |
whisper - event, hf - asr - leaderboard, generated_from_multiple_datasets |
Datasets |
mozilla - foundation/common_voice_11_0, google/fleurs, bayartsogt/ulaanbal - v0, bayartsogt/youtube - mongolian - v1 |
Metrics |
wer, cer |
Model Index Name |
whisper - large - v2 - mn - 13 |
Evaluation Task |
Automatic Speech Recognition |
Evaluation Dataset |
Common Voice 11.0 (mozilla - foundation/common_voice_11_0, config: mn, split: test) |
Wer on Evaluation Set |
20.02403320952589 |
Cer on Evaluation Set |
6.601024224251205 |