whisper-large-v3-persian-common-voice-17 Open Source Model - Improve the Accuracy of Persian Automatic Speech Recognition

Whisper Large V3 Persian Common Voice 17

Developed by msghol

A Persian automatic speech recognition model fine-tuned based on Whisper Large v3, trained using the Common Voice 17 dataset, significantly improving Persian recognition accuracy.

Speech Recognition

Transformers

OtherOpen Source License:MIT #Persian speech recognition #Large-scale dataset fine-tuning #Low word error rate

Downloads 442

Release Time : 3/15/2025

Model Overview

This is an automatic speech recognition model specifically optimized for Persian, based on OpenAI's Whisper Large v3 architecture, fine-tuned on the Persian subset of Mozilla Common Voice 17.

Model Features

Large-scale data training

Trained with over 250,000 Persian speech samples, significantly improving recognition accuracy compared to previous versions (83,000 samples)

Low word error rate

Achieved a word error rate (WER) of 21.43 in Persian speech recognition

Specialized optimization

Specifically optimized for Persian language characteristics, improving recognition accuracy and robustness for this language

Model Capabilities

Persian speech recognition

Long audio processing (supports 30-second chunks)

Use Cases

Speech-to-text

Persian meeting transcription

Automatically convert Persian meeting recordings into text transcripts

Improved accuracy, reduced word error rate

Persian media subtitle generation

Automatically generate subtitles for Persian video content

Increased subtitle production efficiency

🚀 Whisper Large v3 - Persian (Common Voice 17)

Whisper Large v3 fine-tuned on Common Voice 17 for enhanced Persian automatic speech recognition, offering lower error rates and greater accuracy.

🚀 Quick Start

Whisper Large v3 has been fine-tuned on Common Voice 17, leveraging over 250,000 Persian audio samples. This is a significant improvement over earlier models trained on Common Voice 11, which only had 83,000 samples. The larger dataset has led to a lower Word Error Rate (WER), enhancing the model's accuracy and robustness in recognizing Persian speech.

This update is a major step forward in Persian ASR, and we hope it benefits the Persian-speaking community, making high-quality speech recognition more accessible and reliable. 🚀

📦 Installation

The installation steps are mainly about using the transformers library. You can install it via the following command if not already installed:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import pipeline

asr_pipe = pipeline(
    "automatic-speech-recognition",
    model="msghol/whisper-large-v3-persian-common-voice-17",
    chunk_length_s=30
)

text = asr_pipe("your_file")["text"]
print(text)

📚 Documentation

Model Information

Property	Details
Model Name	Whisper Large v3 - Persian (Common Voice 17)
Base Model	Whisper Large v3
Language	Persian (Farsi)
Dataset	Mozilla Common Voice 17 (Persian subset)
Hardware Used	NVIDIA A100 GPU
Batch Size	16
Training Steps	5000
WER (Word Error Rate)	21.43

Notes

⚠️ Important Note

Since the fine tuning process does not include any timestamps, the model can not return any timestamps. Even when you are trying to return it, you would encounter an Error. The solution is to chunk audio files into smaller chunks.

💡 Usage Tip

Further fine tuning would definitely increase the accuracy of the model. We are currently looking for sponserships for Hardware and ASR dataset collaborations.

Citation

@misc{whisper_persian_cv17,
  author = {Mohammad Sadegh Gholizadeh},
  title = {Whisper Large v3 - Persian (Common Voice 17)},
  year = {2025},
  url = {https://huggingface.co/msghol/whisper-large-v3-persian-common-voice-17}
}

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご