ctrlv - wav2vec2 - Tokenizer Open-source Speech Recognition Model: Precise Recognition with Low Word Error Rate!

Ctrlv Wav2vec2 Tokenizer

Developed by proseph

A speech recognition model fine-tuned based on facebook/wav2vec2-base, achieving a 31.38% word error rate on the evaluation set

Downloads 25

Release Time : 4/20/2022

Model Overview

This model is a speech recognition model based on the wav2vec2 architecture, suitable for tasks converting speech to text

Efficient Fine-tuning

Fine-tuned based on the wav2vec2-base model, achieving good results on a relatively small dataset

Low Word Error Rate

Achieved a 31.38% word error rate on the evaluation set, outperforming the base model

Optimized Training

Used linear learning rate scheduling with 1000 warm-up steps, ensuring stable and efficient training

Speech-to-Text

Automatic Speech Recognition

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Accuracy approximately 68.62% (based on 31.38% WER)

Voice Notes

Convert voice memos into searchable text

Training Loss	Epoch	Step	Validation Loss	Wer
3.4359	3.45	500	1.3595	0.9159
0.5692	6.9	1000	0.4332	0.4036
0.2198	10.34	1500	0.4074	0.3678
0.1314	13.79	2000	0.3480	0.3409
0.0929	17.24	2500	0.3714	0.3346
0.0692	20.69	3000	0.3977	0.3224
0.0542	24.14	3500	0.4068	0.3187
0.0422	27.59	4000	0.3967	0.3138

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base