đ WhisperLevantineArabic
A fine-tuned Whisper model for the Levantine Dialect (Israeli-Arabic), enhancing automatic speech recognition for this specific Arabic variant.
Thanks to ivrit.ai for providing the fine-tuning code scripts!
đ Quick Start
The fine-tuned model was converted using the faster-whisper package, enabling inference up to 4Ã faster than OpenAI's Whisper. The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results.
⨠Features
- Fine-tuned for Levantine Arabic: Specifically tailored for transcribing Levantine Arabic, especially the Israeli dialect.
- Improved ASR Performance: Designed to enhance automatic speech recognition for this particular variant of Arabic.
- Faster Inference: Converted using
faster-whisper
for up to 4Ã faster inference compared to OpenAI's Whisper.
đĻ Installation
There is no specific installation step provided in the original README. If you want to use the model, you need to install faster-whisper
as shown in the usage example:
pip install faster-whisper
đģ Usage Examples
Basic Usage
Will save a .vtt file with transcriptions and timestamps in audio_dir:
python transcriber.py --model_path path/to/model --audio_dir path/to/audio --word_timestamps True --vad_filter True
Advanced Usage
To visualize printed transcriptions:
pip install faster-whisper
import faster_whisper
import librosa
model = faster_whisper.WhisperModel("model.bin")
audio_file = 'your audio file.wav'
with torch.no_grad():
audio_data, sample_rate = librosa.load(audio_file)
audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
segments, _ = model.transcribe(audio_data, language='ar')
for segment in segments:
for word in segment.words:
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
transcript = ' '.join(s.text for s in segments)
đ Documentation
Model Description
This model is a fine-tuned version of Whisper Larg v3 tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic.
Property |
Details |
Model Type |
Fine-tuned Whisper Large V3 |
Fine-tuned for |
Levantine Arabic (Israeli Dialect) |
WER on test set |
33% |
Training Data
The dataset used for training and fine-tuning this model consists of approximately 1,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
- Self-maintained Collection: 1,200 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
Property |
Details |
Total Dataset Size |
~1,200 hours |
Sampling Rate |
8kHz - upsampled to 16kHz |
Annotation |
Human-transcribed and annotated for high accuracy. |
đ License
This project is licensed under the Apache-2.0 license.