WavLM-Basic_n-f-n Open-source Speech Processing Model - Free to Use, with an Accuracy Rate of 73.33% on the Evaluation Set

Wavlm Basic N F N 8batch 5sec 0.0001lr Unfrozen

Developed by reralle

A speech processing model fine-tuned based on microsoft/wavlm-large, achieving an accuracy of 73.33% on the evaluation set

Downloads 14

Release Time : 4/27/2023

Model Overview

This model is a speech processing model based on the WavLM architecture, fine-tuned for specific speech recognition or classification tasks

Efficient fine-tuning

Fine-tuned with a learning rate of 0.0001, achieving good results on limited data

Stable training

Accuracy steadily improved during training, from an initial 16.67% to 73.33%

Batch optimization

Adopted a batch size of 8 and gradient accumulation steps of 4, resulting in a total training batch size of 32

Voice feature extraction

Speech classification

Speech recognition

Speech processing

Speech emotion recognition

Identify emotion categories in speech

Accuracy 73.33%, F1 score 73.08%

Voice command classification

Classify voice commands

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
2.3031	0.98	24	2.3002	0.1667	0.1148
2.2766	2.0	49	2.2805	0.15	0.0930
2.2298	2.98	73	2.0679	0.2333	0.1421
1.9839	4.0	98	1.8757	0.25	0.1380
1.7495	4.98	122	1.5981	0.4	0.3370
1.5318	6.0	147	1.4640	0.45	0.3698
1.2765	6.98	171	1.3181	0.5167	0.4437
1.261	8.0	196	1.0905	0.5833	0.5429
1.078	8.98	220	1.0944	0.55	0.5244
0.9116	10.0	245	0.8228	0.6167	0.5603
0.8973	10.98	269	0.8632	0.5833	0.5266
0.8033	12.0	294	0.9061	0.65	0.6398
0.7183	12.98	318	0.8047	0.7	0.6877
0.7526	14.0	343	0.6695	0.7333	0.7176
0.6381	14.98	367	0.7510	0.7833	0.7788
0.5266	16.0	392	0.6154	0.8	0.7901
0.4485	16.98	416	0.8614	0.75	0.7359
0.5123	18.0	441	1.0848	0.65	0.6306
0.4094	18.98	465	0.6748	0.7667	0.7680
0.3114	20.0	490	0.7406	0.75	0.7389
0.2668	20.98	514	0.8419	0.75	0.7424

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base