๐ whisper-medium-creole-oswald
This model is a fine - tuned version of openai/whisper-medium on the creole - text - voice dataset. Its main goal is to create a 99% accurate Haitian Creole Speech - to - Text model, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.
๐ Quick Start
This model can be used to transcribe Haitian Creole speech. You can use the following Python code examples to get started.
โจ Features
- Optimized for Haitian Creole: whisper-medium-creole-oswald is specifically optimized for Haitian Creole automatic speech recognition (ASR).
- Based on Whisper Architecture: It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine - tuning.
- Diverse Intended Uses: Can be used for transcribing various types of Haitian Creole speech and enabling Creole voice interfaces in multiple applications.
๐ฆ Installation
The code examples rely on several Python libraries such as transformers
, librosa
, torch
, gradio
, datasets
, and tokenizers
. You can install them using pip
:
pip install transformers librosa torch gradio datasets tokenizers
๐ป Usage Examples
Basic Usage
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa
import numpy as np
import torch
processor = AutoProcessor.from_pretrained("jsbeaudry/whisper-medium-oswald")
model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/whisper-medium-oswald")
def transcript (audio_file_path):
speech_array, sampling_rate = librosa.load(audio_file_path, sr=16000)
speech_array_pt = torch.from_numpy(speech_array).unsqueeze(0)
input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
return transcription
text = transcript("/path_audio")
print(text)
Advanced Usage
from transformers import pipeline
import gradio as gr
print("Loading model...")
pipe = pipeline(model="jsbeaudry/whisper-medium-oswald")
print("Model loaded successfully.")
def transcribe(audio_path):
if audio_path is None:
return "Please upload or record an audio file first."
result = pipe(audio_path)
return result["text"]
def create_interface():
with gr.Blocks(title="Whisper Medium - Haitian Creole") as demo:
gr.Markdown("# ๐๏ธ Whisper Medium Creole ASR")
gr.Markdown(
"Upload an audio file or record your voice in Haitian Creole. "
"Then click **Transcribe** to see the result."
)
with gr.Row():
with gr.Column():
audio_input = gr.Audio(source="upload", type="filepath", label="๐ง Upload Audio")
audio_input2 = gr.Audio(source="microphone", type="filepath", label="๐ค Record Audio")
with gr.Column():
transcribe_button = gr.Button("๐ Transcribe")
output_text = gr.Textbox(label="๐ Transcribed Text", lines=4)
transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
transcribe_button.click(fn=transcribe, inputs=audio_input2, outputs=output_text)
return demo
if __name__ == "__main__":
interface = create_interface()
interface.launch()
๐ Documentation
Model Details
Property |
Details |
Model Type |
whisper-medium-creole-oswald |
Base Model |
openai/whisper-medium |
Fine - tuned for |
Haitian Creole (Kreyรฒl Ayisyen) |
Architecture |
Whisper Medium |
Vocabulary |
Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances |
Voice types |
Made with female synthetics voices |
Sampling rate |
16kHz |
Training objective |
Maximize transcription accuracy for everyday Creole speech |
Intended Uses
- Transcribe Haitian Creole speech from voice notes, radio shows, interviews, public speeches, educational content, and synthetic voices.
- Enable Creole voice interfaces in voice assistants, transcription services, language - learning tools, chatbots, and accessibility platforms.
Limitations
โ ๏ธ Important Note
- The model may struggle with heavily code - switched speech (Creole + French/English mixed), extremely poor audio quality (e.g., heavy background noise), very fast or mumbled speech in some dialects, and long - duration audio files.
- It is not optimized for real - time transcription on low - resource devices.
- Since it is fine - tuned on a specific dataset, it might generalize less to completely unseen voice types or rare accents.
Training and Evaluation Data
The model was trained on the creole - text - voice dataset, which includes 5 hours of Haitian Creole Synthetic speech and annotated, time - aligned text transcripts following standard Creole orthography. The data sources for next steps include public domain radio and podcast archives, open - access interviews and spoken - word audio, and community - submitted voice samples. The preprocessing steps involve Voice Activity Detection (VAD), noise filtering and audio normalization, and manual transcript review and correction.
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5
- mixed_precision_training: Native AMP
Framework Versions
- Transformers 4.46.1
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.20.3
๐ License
This model is licensed under the apache - 2.0 license.
๐ Citation
If you use this model, please cite:
@misc{whispermediumcreoleoswald2025,
title={Whisper Medium Creole - Oswald},
author={Jean sauvenel beaudry},
year={2025},
howpublished={\url{https://huggingface.co/solvexalab/whisper-medium-creole-oswald}}
}