Open source of fine-tuned wav2vec2-large-xlsr-53 speech recognition model - Optimize the recognition of 10ms audio masked data

Wav2vec2 Large Xlsr 53 Toy Train Data Masked Audio 10ms

Developed by scasutt

Speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, optimized on 10ms audio masked training data

Downloads 22

Release Time : 3/28/2022

Model Overview

This model is an optimized version for speech recognition tasks, with improved recognition accuracy under specific conditions through fine-tuning

10ms audio masked training

Uses a special training method with 10ms audio masking, potentially improving the model's ability to recognize short-term audio features

Fine-tuning optimization

Fine-tuned based on a pre-trained model, achieving better performance on specific datasets

Speech recognition

Audio feature extraction

Speech-to-text

Speech transcription

Convert speech content into text

Word error rate 0.4929

Training Loss	Epoch	Step	Validation Loss	Wer
3.4049	1.05	250	3.3497	1.0
3.0851	2.1	500	3.4440	1.0
2.3512	3.15	750	1.5938	0.9317
1.1762	4.2	1000	0.8481	0.7333
0.903	5.25	1250	0.7180	0.6484
0.6754	6.3	1500	0.6603	0.6044
0.5961	7.35	1750	0.6410	0.5778
0.5325	8.4	2000	0.6245	0.5545
0.4685	9.45	2250	0.5925	0.5359
0.4526	10.5	2500	0.5991	0.5345
0.3975	11.55	2750	0.5916	0.5228
0.3672	12.6	3000	0.5882	0.5037
0.3774	13.65	3250	0.5693	0.5028
0.3489	14.7	3500	0.5645	0.5018
0.3593	15.75	3750	0.5977	0.5043
0.3167	16.81	4000	0.6049	0.5018
0.3225	17.86	4250	0.6172	0.4921
0.2807	18.91	4500	0.5937	0.4923
0.2889	19.96	4750	0.5945	0.4929

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base