đ malaya-speech_Mrbrown_finetune1
This model is a fine - tuned version of [malay - huggingface/wav2vec2 - xls - r - 300m - mixed](https://huggingface.co/malay - huggingface/wav2vec2 - xls - r - 300m - mixed) on the uob_singlish dataset. It aims to achieve better performance in speech - related tasks on the specific dataset.
đ Quick Start
This section provides a brief overview of the model and its performance. The model was fine - tuned on a self - made dataset created by slicing the audio from "https://www.youtube.com/watch?v=a2ZOTD3R7JI" and writing corresponding transcripts, with a total duration of 4 minutes. However, the fine - tuning results were quite poor.
Evaluation Results
It achieves the following results on the evaluation set:
đ§ Technical Details
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.01
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 100
- mixed_precision_training: Native AMP
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
0.3186 |
20.0 |
200 |
4.2225 |
1.13 |
0.4911 |
40.0 |
400 |
4.0427 |
0.99 |
0.9014 |
60.0 |
600 |
5.3285 |
1.04 |
1.0955 |
80.0 |
800 |
3.6922 |
1.02 |
0.7533 |
100.0 |
1000 |
3.8458 |
1.01 |
Framework Versions
- Transformers 4.11.3
- Pytorch 1.10.0+cu113
- Datasets 1.18.3
- Tokenizers 0.10.3
đ Documentation
Model Description
More information needed
Intended Uses & Limitations
More information needed
Training and Evaluation Data
More information needed
Analysis of Poor Fine - Tuning Results
The poor fine - tuning results may imply that the training/fine - tuning dataset must be of high quality and at least several hours long. Another possible reason could be that the learning rate was set too high (0.01). Further research is needed to identify the important factors.