Open-source Speech Recognition Model wav2vec2-base-demo-colab - Achieve Precise Speech-to-Text with Low Word Error Rate

Wav2vec2 Base Demo Colab

Developed by brever

A fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 31.42% on the evaluation set

Downloads 16

Release Time : 5/22/2022

Model Overview

This model is a fine-tuned version of wav2vec2-base, focusing on speech recognition tasks, suitable for applications converting speech to text

Low Word Error Rate

Achieved a word error rate of 31.42% on the evaluation set, demonstrating good performance

Fine-tuned based on wav2vec2-base

Optimized based on the mature wav2vec2-base architecture

Efficient Training

Utilized mixed precision training and linear learning rate scheduler to optimize the training process

Speech Recognition

Audio-to-Text

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Accuracy approximately 68.58% (based on 31.42% WER)

Subtitle Generation

Automatically generate subtitles for video content

Training Loss	Epoch	Step	Validation Loss	Wer
3.4086	3.45	500	1.1494	0.8509
0.5968	6.9	1000	0.4306	0.4169
0.2363	10.34	1500	0.3820	0.3669
0.1365	13.79	2000	0.3863	0.3487
0.0916	17.24	2500	0.3851	0.3391
0.0704	20.69	3000	0.3759	0.3271
0.0537	24.14	3500	0.3747	0.3222
0.0413	27.59	4000	0.3944	0.3142

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base