The open-source model whisper-large-v3-persian-common-voice-17: Improving the accuracy and robustness of Persian speech recognition

Whisper Large V3 Persian Common Voice 17

Developed by MohammadGholizadeh

A Persian automatic speech recognition model fine-tuned based on Whisper Large v3, trained on the Common Voice 17 dataset, which contains over 250,000 Persian audio samples, significantly improving recognition accuracy and robustness.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Persian Speech Recognition #High-precision ASR #Data Augmentation Training

Downloads 978

Release Time : 3/15/2025

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Persian, aiming to provide more accurate and reliable speech recognition services for the Persian community.

Model Features

Data Augmentation

Fine-tuned using the Common Voice 17 dataset, which contains over 250,000 Persian audio samples, significantly reducing the Word Error Rate (WER).

Model Precision

Performs excellently in Persian speech recognition tasks, providing more accurate and reliable speech recognition services for the Persian community.

Model Capabilities

Persian Speech Recognition

High-precision Speech-to-Text

Use Cases

Speech-to-Text

Persian Speech Transcription

Converts Persian speech to text, suitable for scenarios such as voice recording and meeting minutes.

The Word Error Rate (WER) is 21.43

🚀 Whisper Large v3 - Persian (Common Voice 17)

This model is a fine - tuned version of Whisper Large v3 on the Common Voice 17 dataset, significantly improving the accuracy of Persian automatic speech recognition.

🚀 Quick Start

Whisper Large v3 has been fine - tuned on Common Voice 17, leveraging over 250,000 Persian audio samples—a significant improvement over earlier models trained on Common Voice 11, which contained only 83,000 samples. This larger dataset has resulted in a lower Word Error Rate (WER), enhancing the model's accuracy and robustness in recognizing Persian speech.

This update marks a major step forward in Persian ASR, and we hope it benefits the Persian - speaking community, making high - quality speech recognition more accessible and reliable. 🚀

📦 Installation

No specific installation steps are provided in the original README. So, this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import pipeline

asr_pipe = pipeline(
    "automatic-speech-recognition",
    model="MohammadGholizadeh/whisper-large-v3-persian-common-voice-17",
    chunk_length_s=30
)

text = asr_pipe("your_file")["text"]
print(text)

📚 Documentation

Property	Details
Model Name	Whisper Large v3 - Persian (Common Voice 17)
Base Model	Whisper Large v3
Language	Persian (Farsi)
Dataset	Mozilla Common Voice 17 (Persian subset)
Hardware Used	NVIDIA A100 GPU
Batch Size	16
Training Steps	5000
WER (Word Error Rate)	21.43

🔧 Technical Details

No specific technical details (more than 50 words of detailed technical description) are provided in the original README. So, this section is skipped.

📄 License

The model is licensed under the Apache - 2.0 license.

📚 Notes

⚠️ Important Note

Since the fine - tuning process does not include any timestamps, the model cannot return any timestamps. Even when you are trying to return it, you would encounter an Error. The solution is to chunk audio files into smaller chunks. Further fine - tuning would definitely increase the accuracy of the model. We are currently looking for sponsorships for Hardware and ASR dataset collaborations.

BibTeX Citation

@misc{whisper_persian_cv17,
  author = {Mohammad Sadegh Gholizadeh},
  title = {Whisper Large v3 - Persian (Common Voice 17)},
  year = {2025},
  url = {https://huggingface.co/msghol/whisper-large-v3-persian-common-voice-17}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご