Wav2vec2-final-1-lm-2 Open-source Speech Recognition Model - Accurately Recognize Speech Content with Low Error Rate

Wav2vec2 Final 1 Lm 2

Developed by chrisvinsen

A fine-tuned speech recognition model based on facebook/wav2vec2-base, with a Word Error Rate (WER) of 0.283, and 0.126 when using 3-gram

Downloads 15

Release Time : 6/2/2022

Model Overview

This is a fine-tuned model for speech recognition, based on the wav2vec2 architecture, trained on a specific dataset

Low Word Error Rate

Word Error Rate on the evaluation set is 0.4499, reduced to 0.126 when using 3-gram

Based on wav2vec2 Architecture

Uses facebook's wav2vec2-base as the base model for fine-tuning

Optimized Training

Trained for 60 epochs with linear learning rate scheduling and warm-up strategy

Speech Recognition

Audio to Text Conversion

Speech Transcription

Meeting Minutes Transcription

Convert meeting recordings into text transcripts

Word Error Rate 0.283

Voice Command Recognition

Recognize and understand voice commands

Training Loss	Epoch	Step	Validation Loss	Wer
3.4816	2.74	400	1.0717	0.8927
0.751	5.48	800	0.7155	0.7533
0.517	8.22	1200	0.7039	0.6675
0.3988	10.96	1600	0.5935	0.6149
0.3179	13.7	2000	0.6477	0.5999
0.2755	16.44	2400	0.5549	0.5798
0.2343	19.18	2800	0.6626	0.5798
0.2103	21.92	3200	0.6488	0.5674
0.1877	24.66	3600	0.5874	0.5339
0.1719	27.4	4000	0.6354	0.5389
0.1603	30.14	4400	0.6612	0.5210
0.1401	32.88	4800	0.6676	0.5131
0.1286	35.62	5200	0.6366	0.5075
0.1159	38.36	5600	0.6064	0.4977
0.1084	41.1	6000	0.6530	0.4835
0.0974	43.84	6400	0.6118	0.4853
0.0879	46.58	6800	0.6316	0.4770
0.0815	49.32	7200	0.6125	0.4664
0.0708	52.05	7600	0.6449	0.4683
0.0651	54.79	8000	0.6068	0.4571
0.0555	57.53	8400	0.6305	0.4499

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base