🚀 Model Card for Model ID
This model is a Hebrew finetune (continued training) of the OpenAI Whisper Large v3 model, which is specifically designed for Hebrew audio transcription.
✨ Features
- Based on the OpenAI Whisper Large v3 model, finetuned for Hebrew language processing.
- Suitable for Hebrew audio transcription tasks.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model Details
Model Description
Property |
Details |
Developed by |
ivrit-ai |
Language(s) (NLP) |
Hebrew |
License |
Apache-2.0 |
Finetuned from model |
openai/whisper-large-v3 |
Bias, Risks, and Limitations
⚠️ Important Note
The language detection capability of this model has been degraded during training. It is intended for mostly-Hebrew audio transcription, and the language token should be explicitly set to Hebrew. Additionally, the translation task was not trained and also degraded, so this model would not be able to translate in any reasonable capacity.
How to Get Started with the Model
Please follow the original model card for usage details, replacing with this model name. You can also find other weight formats and quantizations on the ivrit ai HF page.
We created some simple example scripts using this model and weights for other inference runtimes. Find those in the "examples" folder within the training GitHub repo.
Training Details
Training Data
This model was trained on the following datasets:
Training Procedure
This model is a weighted-average of the 3 lowest eval loss checkpoints from the same training run. Training code can be found on the ivrit-ai Github here.
Preprocessing
The "Crowd Recital" and "Knesset" datasets contain timestamps and previous text following the Whisper expected inputs. Timestamps were used from 40% of samples from those datasets, and 50% of the previous text was used.
The "Crowd Transcribe" datasets has no timestamps or previous text, and this preprocessing only included melspec feature extraction and text encoding.
Preprocessing code can be found within the training code repository.
Datasets were interleaved with a 0.15:0.8:0.05 ratio (knesset:crowd-transcribe:crowd-recital).
Training Hyperparameters
- Training regime: bf16 mixed precision with sdpa
- Learning Rate: 1e-5, Linear decay, 800 steps warmup for 5 epochs
- Batch Size: 32
Training Hardware / Duration
- GPU Type: 8 x Nvidia A40 machine
- Duration: ~10h run, stopped at 2.2 epochs
Evaluation
Please refer to the ivrit-ai/hebrew-transcription-leaderboard
📄 License
This model is licensed under the Apache-2.0 license.