whisper-base-hungarian_v1 Open-source Speech Recognition Model - Accurately recognize Hungarian, outperforming similar models in performance

Whisper Base Hungarian V1

Developed by sarpba

Hungarian speech recognition model fine-tuned based on OpenAI Whisper-base, trained on 1200 hours of Hungarian data, outperforming similar models

Speech Recognition

Transformers

Other#Hungarian speech recognition #Low word error rate #Multi-domain applicability

Downloads 26

Release Time : 10/12/2024

Model Overview

Automatic Speech Recognition (ASR) model specifically optimized for Hungarian, fine-tuned on Whisper architecture, suitable for Hungarian speech-to-text tasks

Model Features

Hungarian optimization

Specially fine-tuned for Hungarian, performing better than the original Whisper-base model on Hungarian recognition tasks

Large-scale training data

Trained using approximately 1200 hours of carefully selected Hungarian audio data

Rigorous evaluation

Evaluated using independent test sets to ensure the model is not contaminated by test data

Model Capabilities

Hungarian speech recognition

Speech-to-text

Audio transcription

Use Cases

Speech transcription

Hungarian meeting minutes

Automatically convert Hungarian meeting recordings into text transcripts

Word error rate approximately 25-30%

Voice note transcription

Convert Hungarian voice notes into searchable text

Voice assistant

Hungarian voice command recognition

Used for voice command recognition in Hungarian voice assistant systems

🚀 Whisper Base Hungarian

This is a Hungarian fine-tuned Whisper Base model, achieving state-of-the-art results on various datasets!

🚀 Quick Start

I've removed all initial attempts. This is the best Hungarian fine-tuned Whisper Base model that can be created with the currently available tools and technology. It outperforms other Hungarian fine-tuned base models by orders of magnitude on all datasets!

This model is a fine-tuned version of openai/whisper-base on the sarpba/big_audio_data_hun dataset.

Test results: ("google/fleurs", "hu_hu", "test") (during training)

Loss: 0.7999
Wer Ortho: 33.8788
Wer: 29.4814

("mozilla-foundation/common_voice_17_0", "hu", "test")

WER: 25.58
CER: 6.34
Normalised WER: 21.18
Normalised CER: 5.31

✨ Features

High Performance: Achieves significantly better results than other Hungarian fine-tuned base models on all datasets.
Fine-tuned for Hungarian: Specifically fine-tuned for the Hungarian language on a unique dataset.

📚 Documentation

Model description

A Whisper Base model fine-tuned for Hungarian on a unique dataset.

Intended uses & limitations

⚠️ Important Note

Commercial use of this fine-tuning is not permitted without my contribution! It can be freely used for personal purposes under the original license terms of Whisper!

Training and evaluation data

The model was created based on approximately 1200 hours of carefully selected Hungarian audio material. During training, tests were conducted using google/flerus to monitor progress. Below are the results from mozilla-foundation/common_voice_17_0.

Neither dataset was included in the training data, so the model is not contaminated with test material!

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 64
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
training_steps: 8000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Ortho	Wer
0.2523	0.3770	1000	0.9703	50.8988	46.7185
0.1859	0.7539	2000	0.8605	43.4345	39.4103
0.127	1.1309	3000	0.8378	40.6107	36.0040
0.1226	1.5079	4000	0.8153	38.9189	34.1842
0.1105	1.8848	5000	0.7847	36.6018	32.1979
0.0659	2.2618	6000	0.8298	35.3752	30.6379
0.0594	2.6388	7000	0.8132	34.8255	30.2280
0.0316	3.0157	8000	0.7999	33.8788	29.4814

Framework versions

Transformers 4.45.2
Pytorch 2.3.0+cu121
Datasets 3.0.1
Tokenizers 0.20.1

📄 License

The model can be freely used for personal purposes under the original license terms of Whisper. Commercial use requires the contributor's permission.

📦 Model Information

Property	Details
Library Name	transformers
Language	Hungarian
Base Model	openai/whisper-base
Tags	generated_from_trainer
Datasets	fleurs
Metrics	wer
Model Name	Whisper Base Hungarian v1
Task	Automatic Speech Recognition
Dataset Name	google/fleurs
Dataset Type	fleurs
Dataset Config	hu_hu
Dataset Split	test
Dataset Args	hu_hu
Wer Value	29.48142356294297

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご