đ Whisper Base Hungarian
This is a Hungarian fine-tuned Whisper Base model, achieving state-of-the-art results on various datasets!
đ Quick Start
I've removed all initial attempts. This is the best Hungarian fine-tuned Whisper Base model that can be created with the currently available tools and technology. It outperforms other Hungarian fine-tuned base models by orders of magnitude on all datasets!
This model is a fine-tuned version of openai/whisper-base on the sarpba/big_audio_data_hun dataset.
Test results:
("google/fleurs", "hu_hu", "test") (during training)
- Loss: 0.7999
- Wer Ortho: 33.8788
- Wer: 29.4814
("mozilla-foundation/common_voice_17_0", "hu", "test")
- WER: 25.58
- CER: 6.34
- Normalised WER: 21.18
- Normalised CER: 5.31
⨠Features
- High Performance: Achieves significantly better results than other Hungarian fine-tuned base models on all datasets.
- Fine-tuned for Hungarian: Specifically fine-tuned for the Hungarian language on a unique dataset.
đ Documentation
Model description
A Whisper Base model fine-tuned for Hungarian on a unique dataset.
Intended uses & limitations
â ī¸ Important Note
Commercial use of this fine-tuning is not permitted without my contribution! It can be freely used for personal purposes under the original license terms of Whisper!
Training and evaluation data
The model was created based on approximately 1200 hours of carefully selected Hungarian audio material. During training, tests were conducted using google/flerus to monitor progress. Below are the results from mozilla-foundation/common_voice_17_0.
Neither dataset was included in the training data, so the model is not contaminated with test material!
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 64
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- training_steps: 8000
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer Ortho |
Wer |
0.2523 |
0.3770 |
1000 |
0.9703 |
50.8988 |
46.7185 |
0.1859 |
0.7539 |
2000 |
0.8605 |
43.4345 |
39.4103 |
0.127 |
1.1309 |
3000 |
0.8378 |
40.6107 |
36.0040 |
0.1226 |
1.5079 |
4000 |
0.8153 |
38.9189 |
34.1842 |
0.1105 |
1.8848 |
5000 |
0.7847 |
36.6018 |
32.1979 |
0.0659 |
2.2618 |
6000 |
0.8298 |
35.3752 |
30.6379 |
0.0594 |
2.6388 |
7000 |
0.8132 |
34.8255 |
30.2280 |
0.0316 |
3.0157 |
8000 |
0.7999 |
33.8788 |
29.4814 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.3.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
đ License
The model can be freely used for personal purposes under the original license terms of Whisper. Commercial use requires the contributor's permission.
đĻ Model Information
Property |
Details |
Library Name |
transformers |
Language |
Hungarian |
Base Model |
openai/whisper-base |
Tags |
generated_from_trainer |
Datasets |
fleurs |
Metrics |
wer |
Model Name |
Whisper Base Hungarian v1 |
Task |
Automatic Speech Recognition |
Dataset Name |
google/fleurs |
Dataset Type |
fleurs |
Dataset Config |
hu_hu |
Dataset Split |
test |
Dataset Args |
hu_hu |
Wer Value |
29.48142356294297 |