whisper-fa-tinyyy Open-Source Persian Automatic Speech Recognition Model - Achieve Precise Speech Recognition for Free

Whisper Fa Tinyyy

Developed by hackergeek98

Persian automatic speech recognition model fine-tuned based on OpenAI Whisper-tiny, trained on the common_voice_11_0 dataset

Speech Recognition

Transformers

OtherOpen Source License:MIT #Persian speech recognition #Lightweight ASR model #CommonVoice fine-tuning

Downloads 55

Release Time : 3/23/2025

Model Overview

This is an automatic speech recognition (ASR) model specifically optimized for Persian, suitable for tasks converting Persian speech to text.

Model Features

Persian optimization

Specifically fine-tuned for Persian speech characteristics to improve recognition accuracy

Lightweight model

Based on Whisper-tiny architecture, suitable for deployment in resource-limited environments

Long audio processing

Provides audio segmentation functionality to handle long audio exceeding 30 seconds

Model Capabilities

Persian speech recognition

Audio file transcription

Long audio segmentation processing

Use Cases

Speech transcription

Persian meeting minutes

Automatically convert Persian meeting recordings into text transcripts

Persian media content subtitle generation

Automatically generate subtitles for Persian videos

Voice assistant

Persian voice command recognition

Recognize user commands in Persian voice assistants

🚀 whisper-fa-tinyyy

This model is a fine - tuned version of openai/whisper-tiny on the common_voice_11_0 dataset, which can be used for automatic speech recognition.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-tiny on the common_voice_11_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.0246

✨ Features

Fine - Tuned: Based on the openai/whisper-tiny model, fine - tuned on the common_voice_11_0 dataset.
Low Loss: Achieves a loss of 0.0246 on the evaluation set.

📦 Installation

There is no specific installation steps provided for the model itself in the original README. However, to use the model in Colab, you can install the required packages as follows:

# Install required packages
!pip install torch torchaudio transformers pydub google - colab

💻 Usage Examples

Basic Usage

Here is how to use the model in Colab:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files

# Load the model and processor
model_id = "hackergeek98/whisper-fa-tinyyy"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

# Create pipeline
whisper_pipe = pipeline(
    "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)

# Convert audio to WAV format
def convert_to_wav(audio_path):
    audio = AudioSegment.from_file(audio_path)
    wav_path = "converted_audio.wav"
    audio.export(wav_path, format="wav")
    return wav_path

# Split long audio into chunks
def split_audio(audio_path, chunk_length_ms=30000):  # Default: 30 sec per chunk
    audio = AudioSegment.from_wav(audio_path)
    chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
    chunk_paths = []
    
    for i, chunk in enumerate(chunks):
        chunk_path = f"chunk_{i}.wav"
        chunk.export(chunk_path, format="wav")
        chunk_paths.append(chunk_path)
    
    return chunk_paths

# Transcribe a long audio file
def transcribe_long_audio(audio_path):
    wav_path = convert_to_wav(audio_path)
    chunk_paths = split_audio(wav_path)
    transcription = ""
    
    for chunk in chunk_paths:
        result = whisper_pipe(chunk)
        transcription += result["text"] + "\n"
        os.remove(chunk)  # Remove processed chunk
    
    os.remove(wav_path)  # Cleanup original file
    
    # Save transcription to a text file
    text_path = "transcription.txt"
    with open(text_path, "w") as f:
        f.write(transcription)
    
    return text_path

# Upload and process audio in Colab
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe_long_audio(audio_file)

# Download the transcription file
files.download(transcription_file)

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e - 08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 1
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.0186	0.9998	2357	0.0246

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.1

📄 License

This model is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご