WhisperLevantine Open-source Speech Recognition Model - Free Deployment for Accurate Recognition of Levantine Dialects

Whisperlevantine

Developed by HebArabNlpProject

A Whisper model fine-tuned for the Levantine dialect (Israeli Arabic) to improve the performance of automatic speech recognition for this specific Arabic variant.

Speech Recognition

PyTorch

ArabicOpen Source License:Apache-2.0 #Israeli Arabic ASR #Dialect speech recognition #Efficient transcription

Downloads 378

Release Time : 1/21/2025

Model Overview

A speech recognition model fine-tuned based on Whisper Large V3, specifically optimized for the transcription accuracy of the Israeli Levantine Arabic dialect.

Model Features

Dialect optimization

Specifically fine-tuned for the Israeli Levantine Arabic dialect to improve recognition accuracy.

Efficient inference

Implemented using faster-whisper, with an inference speed 4 times faster than the original Whisper.

Timestamp support

Supports word-level timestamp output for easy audio alignment analysis.

VAD filtering

Built-in Voice Activity Detection (VAD) filtering to reduce false recognition of non-speech parts.

Model Capabilities

Arabic speech transcription

Dialect speech recognition

Timestamped speech transcription

16kHz audio processing

Use Cases

Speech transcription

Israeli Arabic media transcription

Used for transcribing Israeli Arabic news, interviews and other content.

Higher accuracy compared to general Arabic models.

Dialect speech assistant

Build speech interaction applications for Levantine dialect users.

Language research

Dialect speech analysis

Used for the speech feature analysis of the Levantine dialect in linguistic research.

🚀 WhisperLevantineArabic

A fine-tuned Whisper model for the Levantine Dialect (Israeli-Arabic), enhancing automatic speech recognition for this specific Arabic variant.

Thanks to ivrit.ai for providing the fine-tuning code scripts!

🚀 Quick Start

The fine-tuned model was converted using the faster-whisper package, enabling inference up to 4× faster than OpenAI's Whisper. The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results.

✨ Features

Fine-tuned for Levantine Arabic: Specifically tailored for transcribing Levantine Arabic, especially the Israeli dialect.
Improved ASR Performance: Designed to enhance automatic speech recognition for this particular variant of Arabic.
Faster Inference: Converted using faster-whisper for up to 4× faster inference compared to OpenAI's Whisper.

📦 Installation

There is no specific installation step provided in the original README. If you want to use the model, you need to install faster-whisper as shown in the usage example:

pip install faster-whisper

💻 Usage Examples

Basic Usage

Will save a .vtt file with transcriptions and timestamps in audio_dir:

python transcriber.py --model_path path/to/model --audio_dir path/to/audio --word_timestamps True --vad_filter True

Advanced Usage

To visualize printed transcriptions:

pip install faster-whisper
import faster_whisper
import librosa

model = faster_whisper.WhisperModel("model.bin")
audio_file = 'your audio file.wav'
with torch.no_grad():
    audio_data, sample_rate = librosa.load(audio_file)
    audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
    segments, _ = model.transcribe(audio_data, language='ar')
    for segment in segments:
        for word in segment.words:
            print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

    transcript = ' '.join(s.text for s in segments)

📚 Documentation

Model Description

This model is a fine-tuned version of Whisper Larg v3 tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic.

Property	Details
Model Type	Fine-tuned Whisper Large V3
Fine-tuned for	Levantine Arabic (Israeli Dialect)
WER on test set	33%

Training Data

The dataset used for training and fine-tuning this model consists of approximately 1,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:

Self-maintained Collection: 1,200 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.

Property	Details
Total Dataset Size	~1,200 hours
Sampling Rate	8kHz - upsampled to 16kHz
Annotation	Human-transcribed and annotated for high accuracy.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご