đ Whisper Cantonese Model
This is a fine - tuned Whisper v3 model for automatic speech recognition in Cantonese (Yue), offering high - quality performance for related applications.
đ Quick Start
To use this model, you can load it using the Hugging Face Transformers library:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_username/whisper-cantonese")
processor = WhisperProcessor.from_pretrained("your_username/whisper-cantonese")
⨠Features
- Specifically fine - tuned for Cantonese (Yue) automatic speech recognition.
- Trained on the Common Voice 17 dataset for 10 epochs with a learning rate of 1e - 7.
- Can be used in various applications such as voice assistants, transcription services, and accessibility features for Cantonese speakers.
đĻ Installation
This model can be loaded using the Hugging Face Transformers library. Ensure you have the transformers
library installed in your Python environment. You can install it via the following command:
pip install transformers
đģ Usage Examples
Basic Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("your_username/whisper-cantonese")
processor = WhisperProcessor.from_pretrained("your_username/whisper-cantonese")
đ Documentation
Model Details
Property |
Details |
Model Type |
Whisper v3 |
Language |
Cantonese (Yue) |
Training Data |
Common Voice 17 |
Training Duration |
10 epochs |
Learning Rate |
1e - 7 |
Frozen Layers |
12 layers in the decoder are frozen during training |
Developed by |
khleeloo (Rita Frieske) |
License |
apache - 2.0 |
Finetuned from model |
openai/whisper - large - v3 |
Uses
This model is intended for researchers and developers interested in building applications that require speech recognition capabilities in Cantonese. It can be used in various applications, including voice assistants, transcription services, and accessibility features for Cantonese speakers.
Bias, Risks, and Limitations
â ī¸ Important Note
The model is specifically fine - tuned for Cantonese and may not perform well on other languages or dialects. Performance may vary based on the quality and accent of the audio input. The model's effectiveness is dependent on the diversity and richness of the training data.
Training
Training Data
- mozilla - foundation/common_voice_17_0
Evaluation
Testing Data, Factors & Metrics
Common Voice_17_0 yue test split, Common Voice 15_0 yue test split, and Common Voice 15_0 zh - HK test split (these test dataset were used to evaluate Whisper 3.0).
Metrics
Character Error Rate (CER) since Cantonese is a character - based language.
Results
|
CV15_0 zh - HK |
CV 15_0 yue |
CV 17_0 yue |
Whisper large v3 |
10.8 |
16 |
- |
Whisper cantonese (ours) |
18.88 |
8.77 |
7.26 |
Explanation: our model was not trained on zh - HK data consisting of more written Cantonese but rather more vernacular Cantonese version (yue) since it is a speech recognition model. Hence the weaker performance on zh - HK splits of the Common Voice dataset.
đ§ Technical Details
This model is a fine - tuned version of the Whisper v3 model for Cantonese (Yue) automatic speech recognition. It was fine - tuned on the Common Voice 17 dataset for 10 epochs with a learning rate of 1e - 7. During training, 12 layers in the decoder were frozen.
đ License
This model is released under the apache - 2.0 license.
đ Citation
BibTeX:
@misc {rita_frieske_2025,
author = { {Rita Frieske} },
title = { whisper-large-v3-cantonese },
year = 2025,
url = { https://huggingface.co/khleeloo/whisper-large-v3-cantonese },
doi = { 10.57967/hf/4393 },
publisher = { Hugging Face }
}
Model Card Authors
https://khleeloo.github.io/