🚀 wav2vec2-base-Toronto_emotional_speech_set
This model is a fine - tuned version of facebook/wav2vec2-base on the audiofolder dataset. It can classify the emotion when someone speaks in an audio sample, demonstrating the ability to solve complex problems using technology.
✨ Features
- Classify emotions in audio samples.
- Achieve high accuracy, f1 - score, recall, and precision on the evaluation set.
📚 Documentation
Model description
This model classifies the emotion when someone speaks in audio sample.
For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Emotion%20Detection/Toronto%20Emotional%20Speech%20Set%20(TESS)/Toronto%20Emotional%20Speech%20Set%20(TESS).ipynb
Intended uses & limitations
This model is intended to demonstrate my ability to solve a complex problem using technology.
Training and evaluation data
Dataset Source: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 15
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
Weighted f1 |
Micro f1 |
Macro f1 |
Weighted recall |
Micro recall |
Macro recall |
Weighted precision |
Micro precision |
Macro precision |
1.9517 |
0.97 |
17 |
1.9432 |
0.2411 |
0.1338 |
0.2411 |
0.1201 |
0.2411 |
0.2411 |
0.2168 |
0.1161 |
0.2411 |
0.1049 |
1.9517 |
2.0 |
35 |
1.9036 |
0.3375 |
0.3037 |
0.3375 |
0.3082 |
0.3375 |
0.3375 |
0.3533 |
0.5364 |
0.3375 |
0.5379 |
1.9517 |
2.97 |
52 |
1.6629 |
0.4518 |
0.4020 |
0.4518 |
0.3936 |
0.4518 |
0.4518 |
0.4503 |
0.6751 |
0.4518 |
0.6555 |
1.9517 |
4.0 |
70 |
1.2026 |
0.7357 |
0.7121 |
0.7357 |
0.6989 |
0.7357 |
0.7357 |
0.7240 |
0.7903 |
0.7357 |
0.7848 |
1.9517 |
4.97 |
87 |
0.8458 |
0.8839 |
0.8796 |
0.8839 |
0.8767 |
0.8839 |
0.8839 |
0.8845 |
0.8874 |
0.8839 |
0.8807 |
1.9517 |
6.0 |
105 |
0.6493 |
0.8946 |
0.8939 |
0.8946 |
0.8914 |
0.8946 |
0.8946 |
0.8937 |
0.9049 |
0.8946 |
0.9014 |
1.9517 |
6.97 |
122 |
0.5149 |
0.9089 |
0.9046 |
0.9089 |
0.8989 |
0.9089 |
0.9089 |
0.8957 |
0.9275 |
0.9089 |
0.9327 |
1.9517 |
8.0 |
140 |
0.3814 |
0.9536 |
0.9531 |
0.9536 |
0.9501 |
0.9536 |
0.9536 |
0.9474 |
0.9577 |
0.9536 |
0.9583 |
1.9517 |
8.97 |
157 |
0.5627 |
0.85 |
0.8459 |
0.85 |
0.8402 |
0.85 |
0.85 |
0.8378 |
0.9100 |
0.85 |
0.9160 |
1.9517 |
10.0 |
175 |
0.4702 |
0.8911 |
0.8861 |
0.8911 |
0.8854 |
0.8911 |
0.8911 |
0.8938 |
0.9021 |
0.8911 |
0.8967 |
1.9517 |
10.97 |
192 |
0.3362 |
0.9393 |
0.9376 |
0.9393 |
0.9361 |
0.9393 |
0.9393 |
0.9399 |
0.9402 |
0.9393 |
0.9365 |
1.9517 |
12.0 |
210 |
0.3808 |
0.9179 |
0.9181 |
0.9179 |
0.9176 |
0.9179 |
0.9179 |
0.9180 |
0.9251 |
0.9179 |
0.9235 |
1.9517 |
12.97 |
227 |
0.4546 |
0.9036 |
0.9045 |
0.9036 |
0.9024 |
0.9036 |
0.9036 |
0.8988 |
0.9151 |
0.9036 |
0.9157 |
1.9517 |
14.0 |
245 |
0.5065 |
0.8786 |
0.8826 |
0.8786 |
0.8813 |
0.8786 |
0.8786 |
0.8742 |
0.9040 |
0.8786 |
0.9055 |
1.9517 |
14.57 |
255 |
0.4925 |
0.8804 |
0.8837 |
0.8804 |
0.8822 |
0.8804 |
0.8804 |
0.8757 |
0.9044 |
0.8804 |
0.9059 |
Framework versions
- Transformers 4.27.4
- Pytorch 2.0.0
- Datasets 2.11.0
- Tokenizers 0.13.3
📄 License
This project is licensed under the Apache - 2.0 license.