Fine-tuned wav2vec2-base Speech Recognition Model - Open Source and Available, Trained on Partial Data

Wav2vec2 Base Toy Train Data Fast 10pct

Developed by scasutt

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base on an unknown dataset, trained using a 10% data subset.

Downloads 22

Release Time : 3/26/2022

Model Overview

A fine-tuned model for Automatic Speech Recognition (ASR) based on the wav2vec2 architecture, suitable for English speech-to-text tasks.

Efficient Training

Trained using a 10% data subset, suitable for rapid prototyping

Based on wav2vec2 Architecture

Utilizes the advanced speech representation learning architecture developed by Facebook Research

Linear Learning Rate Scheduling

Employs linear learning rate scheduling with warmup during training

English Speech Recognition

Audio Feature Extraction

Speech-to-Text

Speech Transcription

Meeting Minutes

Automatically convert English meeting recordings into text transcripts

Word Error Rate (WER) approximately 0.7175

Voice Notes

Convert personal voice memos into searchable text

Training Loss	Epoch	Step	Validation Loss	Wer
3.1309	1.05	250	3.4541	0.9982
3.0499	2.1	500	3.0231	0.9982
1.4839	3.15	750	1.4387	0.9257
1.1697	4.2	1000	1.3729	0.8792
0.9353	5.25	1250	1.2608	0.8445
0.7298	6.3	1500	1.1867	0.8052
0.6418	7.35	1750	1.2414	0.7997
0.5698	8.4	2000	1.2240	0.7766
0.5084	9.45	2250	1.1910	0.7687
0.4912	10.5	2500	1.2241	0.7617
0.4144	11.55	2750	1.2412	0.7477
0.4153	12.6	3000	1.2736	0.7511
0.405	13.65	3250	1.2827	0.7328
0.3852	14.7	3500	1.1981	0.7331
0.3829	15.75	3750	1.3035	0.7347
0.3538	16.81	4000	1.3003	0.7240
0.3385	17.86	4250	1.3354	0.7304
0.3108	18.91	4500	1.2983	0.7229
0.3037	19.96	4750	1.3087	0.7175

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base