đ whisper-large-v3-turbo-common_voice_19_0-zh-TW
This is an open - source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model, fine - tuned from openai/whisper-large-v3-turbo on the JacobLinCool/common_voice_19_0_zh-TW dataset, offering reliable speech recognition capabilities.
đ Quick Start
This model is a fine - tuned version of openai/whisper-large-v3-turbo on the JacobLinCool/common_voice_19_0_zh-TW dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1786
- Wer: 32.5554
- Cer: 8.6009
- Decode Runtime: 90.9833
- Wer Runtime: 0.1257
- Cer Runtime: 0.1534
⨠Features
- Prompt - free ASR: Designed to be a prompt - free ASR model for Traditional Chinese.
- Inherited LID system: Inherits the language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (
zh
).
- Open - source and free: The model is free to use under the MIT license.
đ Documentation
Model description
This is an open - source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.
Intended uses & limitations
This model is designed to be a prompt - free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (zh
), we expect that performance may degrade when transcribing Simplified Chinese.
The model is free to use under the MIT license.
Training and evaluation data
This model was trained on the Common Voice Corpus 19.0 Chinese (Taiwan) Subset, containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (train+validation
) of [mozilla - foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla - foundation/common_voice_16_1), which includes about 12k examples.
Training procedure
Tensorboard
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 5000
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
Cer |
Decode Runtime |
Wer Runtime |
Cer Runtime |
No log |
0 |
0 |
2.7208 |
76.5011 |
20.4851 |
89.4916 |
0.1213 |
0.1639 |
1.1832 |
0.1 |
500 |
0.1939 |
39.9561 |
10.8721 |
90.0926 |
0.1222 |
0.1555 |
1.5179 |
0.2 |
1000 |
0.1774 |
37.6621 |
9.9322 |
89.8657 |
0.1225 |
0.1545 |
0.6179 |
0.3 |
1500 |
0.1796 |
36.2657 |
9.8325 |
90.2480 |
0.1198 |
0.1573 |
0.3626 |
1.0912 |
2000 |
0.1846 |
36.2258 |
9.7801 |
90.3306 |
0.1196 |
0.1539 |
0.1311 |
1.1912 |
2500 |
0.1776 |
34.8095 |
9.3214 |
90.3124 |
0.1286 |
0.1610 |
0.1263 |
1.2912 |
3000 |
0.1763 |
36.1261 |
9.3563 |
90.4271 |
0.1330 |
0.1650 |
0.2194 |
2.0825 |
3500 |
0.1891 |
34.6898 |
9.3114 |
91.1932 |
0.1320 |
0.1643 |
0.1127 |
2.1825 |
4000 |
0.1838 |
34.0714 |
9.1095 |
90.2416 |
0.1196 |
0.1529 |
0.3792 |
2.2824 |
4500 |
0.1786 |
33.1339 |
8.7679 |
90.9144 |
0.1310 |
0.1550 |
0.0606 |
3.0737 |
5000 |
0.1786 |
32.5554 |
8.6009 |
90.9833 |
0.1257 |
0.1534 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.1
- Pytorch 2.4.0
- Datasets 3.0.2
- Tokenizers 0.20.1
đ License
This model is released under the MIT license.
đ Model Information
Property |
Details |
Library Name |
transformers |
Model Type |
whisper-large-v3-turbo-common_voice_19_0-zh-TW |
Base Model |
openai/whisper-large-v3-turbo |
Tags |
wft, whisper, automatic-speech-recognition, audio, speech, generated_from_trainer |
Datasets |
JacobLinCool/common_voice_19_0_zh-TW |
Metrics |
wer |