๐ whisper-large-v3-myanmar
This model is a fine - tuned version of openai/whisper-large-v3 on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It is designed for automatic speech recognition, specifically tailored for the Myanmar language.
๐ Quick Start
This model is a fine-tuned version of openai/whisper-large-v3 on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1752
- Wer: 54.8976
๐ป Usage Examples
Basic Usage
from datasets import Audio, load_dataset
from transformers import pipeline
dataset = load_dataset("chuuhtetnaing/myanmar-speech-dataset-openslr-80")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
test_dataset = dataset['test']
input_speech = test_dataset[42]['audio']
pipe = pipeline(model='chuuhtetnaing/whisper-large-v3-myanmar')
output = pipe(input_speech, generate_kwargs={"language": "myanmar", "task": "transcribe"})
print(output['text'])
๐ง Technical Details
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 20
- eval_batch_size: 20
- seed: 42
- gradient_accumulation_steps: 3
- total_train_batch_size: 60
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
0.9771 |
1.0 |
42 |
0.7598 |
100.0 |
0.3477 |
2.0 |
84 |
0.2140 |
89.8931 |
0.2244 |
3.0 |
126 |
0.1816 |
79.0294 |
0.1287 |
4.0 |
168 |
0.1510 |
71.9947 |
0.1029 |
5.0 |
210 |
0.1575 |
77.8718 |
0.0797 |
6.0 |
252 |
0.1315 |
70.5254 |
0.0511 |
7.0 |
294 |
0.1143 |
70.5699 |
0.03 |
8.0 |
336 |
0.1154 |
68.1656 |
0.0211 |
9.0 |
378 |
0.1289 |
69.1897 |
0.0151 |
10.0 |
420 |
0.1318 |
66.7854 |
0.0113 |
11.0 |
462 |
0.1478 |
69.1451 |
0.0079 |
12.0 |
504 |
0.1484 |
66.2066 |
0.0053 |
13.0 |
546 |
0.1389 |
65.0935 |
0.0031 |
14.0 |
588 |
0.1479 |
64.3811 |
0.0014 |
15.0 |
630 |
0.1611 |
64.8264 |
0.001 |
16.0 |
672 |
0.1627 |
63.3571 |
0.0012 |
17.0 |
714 |
0.1546 |
65.0045 |
0.0006 |
18.0 |
756 |
0.1566 |
64.5147 |
0.0006 |
20.0 |
760 |
0.1581 |
64.6928 |
0.0002 |
21.0 |
798 |
0.1621 |
63.9804 |
0.0003 |
22.0 |
836 |
0.1664 |
60.8638 |
0.0002 |
23.0 |
874 |
0.1663 |
58.5040 |
0.0 |
24.0 |
912 |
0.1699 |
55.8326 |
0.0 |
25.0 |
950 |
0.1715 |
55.0312 |
0.0 |
26.0 |
988 |
0.1730 |
54.9866 |
0.0 |
27.0 |
1026 |
0.1740 |
54.8976 |
0.0 |
28.0 |
1064 |
0.1747 |
54.8976 |
0.0 |
29.0 |
1102 |
0.1751 |
54.8976 |
0.0 |
30.0 |
1140 |
0.1752 |
54.8976 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.5
- Tokenizers 0.15.1
๐ License
This model is licensed under the Apache 2.0 license.
Property |
Details |
Model Type |
Fine - tuned version of openai/whisper-large-v3 |
Training Data |
chuuhtetnaing/myanmar-speech-dataset-openslr-80 |