Whisper-medium-Oswald Open-Source Speech Recognition Model - Free Deployment for High-Accuracy Transcription of Haitian Creole

Whisper Medium Oswald

Developed by jsbeaudry

Haitian Creole speech recognition model fine-tuned based on OpenAI Whisper-medium, focusing on high-accuracy transcription

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Haitian Creole ASR #High-precision speech transcription #Synthetic voice adaptation

Downloads 102

Release Time : 4/14/2025

Model Overview

This model is an automatic speech recognition system optimized specifically for Haitian Creole, adapted to Creole language characteristics through transfer learning, supporting transcription of various accents and speech scenarios

Model Features

High-accuracy transcription

Optimized for Haitian Creole, aiming to achieve 99% daily speech transcription accuracy

Accent adaptability

Capable of processing Haitian Creole speech with different regional accents and speaking styles

Synthetic voice adaptation

Specifically trained with female synthetic voice data to optimize synthetic speech recognition

Model Capabilities

Haitian Creole speech recognition

Audio transcription

Supports 16kHz sample rate audio processing

Use Cases

Speech transcription services

Voice memo transcription

Convert daily Haitian Creole voice memos into searchable text

Radio program transcription

Automatically generate transcripts for Haitian radio programs

Voice interaction applications

Haitian Creole voice assistant

Provide native language voice interaction experience for Haitian users

Language learning tool

Help learners practice Haitian Creole through speech

🚀 whisper-medium-creole-oswald

This model is a fine - tuned version of openai/whisper-medium on the creole - text - voice dataset. Its main goal is to create a 99% accurate Haitian Creole Speech - to - Text model, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.

🚀 Quick Start

This model can be used to transcribe Haitian Creole speech. You can use the following Python code examples to get started.

✨ Features

Optimized for Haitian Creole: whisper-medium-creole-oswald is specifically optimized for Haitian Creole automatic speech recognition (ASR).
Based on Whisper Architecture: It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine - tuning.
Diverse Intended Uses: Can be used for transcribing various types of Haitian Creole speech and enabling Creole voice interfaces in multiple applications.

📦 Installation

The code examples rely on several Python libraries such as transformers, librosa, torch, gradio, datasets, and tokenizers. You can install them using pip:

pip install transformers librosa torch gradio datasets tokenizers

💻 Usage Examples

Basic Usage

# Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa
import numpy as np
import torch

processor = AutoProcessor.from_pretrained("jsbeaudry/whisper-medium-oswald")
model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/whisper-medium-oswald")

def transcript (audio_file_path):
   
    # Load audio
    speech_array, sampling_rate = librosa.load(audio_file_path, sr=16000)

    # Convert the NumPy array to a PyTorch tensor
    speech_array_pt = torch.from_numpy(speech_array).unsqueeze(0)

    input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features 

    # 2. Generate predictions
    predicted_ids = model.generate(input_features)

    # 3. Decode the predictions
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

    # print(transcription)
    return transcription

text = transcript("/path_audio")

print(text)

Advanced Usage

from transformers import pipeline
import gradio as gr

# Load Whisper model
print("Loading model...")
pipe = pipeline(model="jsbeaudry/whisper-medium-oswald")
print("Model loaded successfully.")

# Transcription function
def transcribe(audio_path):
    if audio_path is None:
        return "Please upload or record an audio file first."
    result = pipe(audio_path)
    return result["text"]

# Build Gradio interface
def create_interface():
    with gr.Blocks(title="Whisper Medium - Haitian Creole") as demo:
        gr.Markdown("# 🎙️ Whisper Medium Creole ASR")
        gr.Markdown(
            "Upload an audio file or record your voice in Haitian Creole. "
            "Then click **Transcribe** to see the result."
        )

        with gr.Row():
            with gr.Column():
                audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
                audio_input2 = gr.Audio(source="microphone", type="filepath", label="🎤 Record Audio")
            with gr.Column():
                transcribe_button = gr.Button("🔍 Transcribe")
                output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
                
    
        transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
        transcribe_button.click(fn=transcribe, inputs=audio_input2, outputs=output_text)

    return demo

if __name__ == "__main__":
    interface = create_interface()
    interface.launch()

📚 Documentation

Model Details

Property	Details
Model Type	whisper-medium-creole-oswald
Base Model	openai/whisper-medium
Fine - tuned for	Haitian Creole (Kreyòl Ayisyen)
Architecture	Whisper Medium
Vocabulary	Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances
Voice types	Made with female synthetics voices
Sampling rate	16kHz
Training objective	Maximize transcription accuracy for everyday Creole speech

Intended Uses

Transcribe Haitian Creole speech from voice notes, radio shows, interviews, public speeches, educational content, and synthetic voices.
Enable Creole voice interfaces in voice assistants, transcription services, language - learning tools, chatbots, and accessibility platforms.

Limitations

⚠️ Important Note

The model may struggle with heavily code - switched speech (Creole + French/English mixed), extremely poor audio quality (e.g., heavy background noise), very fast or mumbled speech in some dialects, and long - duration audio files.

It is not optimized for real - time transcription on low - resource devices.

Since it is fine - tuned on a specific dataset, it might generalize less to completely unseen voice types or rare accents.

Training and Evaluation Data

The model was trained on the creole - text - voice dataset, which includes 5 hours of Haitian Creole Synthetic speech and annotated, time - aligned text transcripts following standard Creole orthography. The data sources for next steps include public domain radio and podcast archives, open - access interviews and spoken - word audio, and community - submitted voice samples. The preprocessing steps involve Voice Activity Detection (VAD), noise filtering and audio normalization, and manual transcript review and correction.

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 5
mixed_precision_training: Native AMP

Framework Versions

Transformers 4.46.1
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.20.3

📄 License

This model is licensed under the apache - 2.0 license.

📌 Citation

If you use this model, please cite:

@misc{whispermediumcreoleoswald2025,
  title={Whisper Medium Creole - Oswald},
  author={Jean sauvenel beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/solvexalab/whisper-medium-creole-oswald}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご