Whisper-Persian-Turbooo Open-source Model - Free Deployment to Boost Persian Speech Recognition in the Medical Field

Whisper Persian Turbooo

Developed by hackergeek98

Persian automatic speech recognition model optimized based on OpenAI Whisper-large-v3-turbo, supporting medical field applications

Speech Recognition

Transformers

OtherOpen Source License:MIT #Persian speech recognition #Medical scenario optimization #Long audio chunk processing

Downloads 51

Release Time : 3/25/2025

Model Overview

This model is an automatic speech recognition (ASR) system optimized for Persian, fine-tuned based on the Whisper-large-v3-turbo architecture, particularly suitable for medical field transcription needs.

Model Features

Persian optimization

Specifically optimized for Persian speech characteristics to improve recognition accuracy

Medical field support

Model tags indicate special suitability for medical field speech recognition scenarios

Long audio processing

Provides automatic segmentation of long audio, supporting 30-second chunk processing

Model Capabilities

Persian speech-to-text

Medical terminology recognition

Long audio automatic segmentation

Multiple audio format support

Use Cases

Healthcare

Medical record transcription

Convert Persian medical records dictated by doctors into text

Word Error Rate (WER) 0.043175

Telemedicine consultation records

Automatically transcribe Persian telemedicine consultation content

🚀 Whisper Persian Turbooo

This is an automatic speech - recognition model based on the Whisper architecture, specifically trained for Persian language processing, suitable for medical scenarios.

🚀 Quick Start

Model Information

Property	Details
Model Type	Automatic Speech Recognition
Training Datasets	mozilla - foundation/common_voice_11_0
Evaluation Metrics	WER (Word Error Rate)
Base Model	openai/whisper - large - v3 - turbo
Library Used	transformers
Tags	medical

Training Metrics

Training Loss: 0.013100
Validation Loss: 0.043175
Number of Epochs: 1

📦 Installation

To use this model in a Google Colab environment, you need to install the required packages. Run the following command:

!pip install torch torchaudio transformers pydub google - colab

💻 Usage Examples

Basic Usage

# Install required packages
!pip install torch torchaudio transformers pydub google-colab

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files

# Load the model and processor
model_id = "hackergeek98/whisper-persian-turbooo"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

# Create pipeline
whisper_pipe = pipeline(
    "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)

# Convert audio to WAV format
def convert_to_wav(audio_path):
    audio = AudioSegment.from_file(audio_path)
    wav_path = "converted_audio.wav"
    audio.export(wav_path, format="wav")
    return wav_path

# Split long audio into chunks
def split_audio(audio_path, chunk_length_ms=30000):  # Default: 30 sec per chunk
    audio = AudioSegment.from_wav(audio_path)
    chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
    chunk_paths = []
    
    for i, chunk in enumerate(chunks):
        chunk_path = f"chunk_{i}.wav"
        chunk.export(chunk_path, format="wav")
        chunk_paths.append(chunk_path)
    
    return chunk_paths

# Transcribe a long audio file
def transcribe_long_audio(audio_path):
    wav_path = convert_to_wav(audio_path)
    chunk_paths = split_audio(wav_path)
    transcription = ""
    
    for chunk in chunk_paths:
        result = whisper_pipe(chunk)
        transcription += result["text"] + "\n"
        os.remove(chunk)  # Remove processed chunk
    
    os.remove(wav_path)  # Cleanup original file
    
    # Save transcription to a text file
    text_path = "transcription.txt"
    with open(text_path, "w") as f:
        f.write(transcription)
    
    return text_path

# Upload and process audio in Colab
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe_long_audio(audio_file)

# Download the transcription file
files.download(transcription_file)

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご