Whisper Large V3 Open-source Model - Free Deployment, Focused on Precise Transcription of Hebrew Audio

Whisper Large V3

Developed by ivrit-ai

A fine-tuned version of OpenAI Whisper Large v3 model specifically for Hebrew language audio transcription tasks

OtherOpen Source License:Apache-2.0 #Hebrew speech transcription #Parliament meeting records #Crowdsourced data training

Downloads 2,068

Release Time : 3/4/2025

Model Overview

This model is a Hebrew-specific version of Whisper Large v3, fine-tuned with 675 hours of Hebrew data to optimize Hebrew audio transcription performance, though with reduced language detection and translation capabilities

Model Features

Hebrew Optimization

Specially fine-tuned for Hebrew audio, achieving better transcription accuracy than the original Whisper

Multi-source Training Data

Incorporates parliamentary records, crowdsourced transcriptions, and read-aloud data to cover diverse Hebrew usage scenarios

Efficient Training

Uses checkpoint weight averaging strategy to achieve optimal performance within limited training time

Model Capabilities

Hebrew audio transcription

Speech-to-text conversion

Use Cases

Government Records

Parliament Meeting Transcription

Automatically transcribes content from Israeli Knesset plenary sessions

Trained using 325 hours of parliamentary data

Education

Wikipedia Content Read-aloud Transcription

Transcribes crowdsourced recordings of Hebrew Wikipedia content

Trained using 50 hours of read-aloud data

🚀 Model Card for Model ID

This model is a Hebrew finetune (continued training) of the OpenAI Whisper Large v3 model, which is specifically designed for Hebrew audio transcription.

✨ Features

Based on the OpenAI Whisper Large v3 model, finetuned for Hebrew language processing.
Suitable for Hebrew audio transcription tasks.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Model Description

Property	Details
Developed by	ivrit-ai
Language(s) (NLP)	Hebrew
License	Apache-2.0
Finetuned from model	openai/whisper-large-v3

Bias, Risks, and Limitations

⚠️ Important Note

The language detection capability of this model has been degraded during training. It is intended for mostly-Hebrew audio transcription, and the language token should be explicitly set to Hebrew. Additionally, the translation task was not trained and also degraded, so this model would not be able to translate in any reasonable capacity.

How to Get Started with the Model

Please follow the original model card for usage details, replacing with this model name. You can also find other weight formats and quantizations on the ivrit ai HF page.

We created some simple example scripts using this model and weights for other inference runtimes. Find those in the "examples" folder within the training GitHub repo.

Training Details

Training Data

This model was trained on the following datasets:

ivrit-ai/crowd-transcribe-v5 - Publicly accessible audio sources have been crowd-transcribed segment-by-segment (~300h).
ivrit-ai/crowd-recital-whisper-training - Crowd-sourced recording of Wikipedia article snippets (~50h).
ivrit-ai/knesset-plenums-whisper-training - A subset of a Knesset (Israeli house of representatives) plenum protocols (~325h).

Training Procedure

This model is a weighted-average of the 3 lowest eval loss checkpoints from the same training run. Training code can be found on the ivrit-ai Github here.

Preprocessing

The "Crowd Recital" and "Knesset" datasets contain timestamps and previous text following the Whisper expected inputs. Timestamps were used from 40% of samples from those datasets, and 50% of the previous text was used.

The "Crowd Transcribe" datasets has no timestamps or previous text, and this preprocessing only included melspec feature extraction and text encoding.

Preprocessing code can be found within the training code repository.

Datasets were interleaved with a 0.15:0.8:0.05 ratio (knesset:crowd-transcribe:crowd-recital).

Training Hyperparameters

Training regime: bf16 mixed precision with sdpa
Learning Rate: 1e-5, Linear decay, 800 steps warmup for 5 epochs
Batch Size: 32

Training Hardware / Duration

GPU Type: 8 x Nvidia A40 machine
Duration: ~10h run, stopped at 2.2 epochs

Evaluation

Please refer to the ivrit-ai/hebrew-transcription-leaderboard

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご