The wav2vec2-base-ft-cv3-v3 open-source speech recognition model - Precise recognition of English speech with a low error rate

Wav2vec2 Base Ft Cv3 V3

Developed by danieleV9H

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base on the Common Voice 3.0 English dataset, achieving a word error rate of 0.247 on the test set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #English-specific #Low Word Error Rate

Downloads 120

Release Time : 6/25/2022

Model Overview

A fine-tuned model for English speech recognition, based on the wav2vec2 architecture and trained on the Common Voice dataset.

Model Features

Low Word Error Rate

Achieved a word error rate of 0.247 on the Common Voice test set, demonstrating excellent performance.

Based on wav2vec2 Architecture

Uses Facebook's wav2vec2-base as the base model, featuring powerful speech feature extraction capabilities.

Linear Learning Rate Scheduling

Employs a linear learning rate scheduling strategy during training, aiding in stable model convergence.

Model Capabilities

English Speech Recognition

Audio-to-Text Conversion

Use Cases

Speech Transcription

Voice Memo Transcription

Automatically converts user voice memos into text

Approximately 75.3% accuracy (based on 1-WER calculation)

Meeting Minutes

Automatically generates text versions of meeting audio recordings

🚀 wav2vec2-base-ft-cv3-v3

This model is a fine - tuned version of facebook/wav2vec2-base on the "mozilla - foundation/common_voice_3_0 english" dataset, aiming to improve speech recognition performance.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the "mozilla - foundation/common_voice_3_0 english" dataset. The "train" and "validation" splits are used for training, while the "test" split is used for validation. It achieves the following results on the evaluation set:

Loss: 0.5787
Wer: 0.2470

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 12
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5935	0.1	500	3.0085	1.0
1.6296	0.21	1000	1.0879	0.5895
0.7154	0.31	1500	0.8224	0.4839
0.6387	0.42	2000	0.7290	0.4302
0.5322	0.52	2500	0.6864	0.4044
0.497	0.63	3000	0.6294	0.3746
0.4659	0.73	3500	0.6388	0.3745
0.4452	0.84	4000	0.6122	0.3570
0.4356	0.94	4500	0.5770	0.3443
0.3976	1.05	5000	0.6145	0.3296
0.3767	1.15	5500	0.6099	0.3325
0.3704	1.25	6000	0.5998	0.3263
0.3541	1.36	6500	0.6070	0.3250
0.3592	1.46	7000	0.6076	0.3352
0.3508	1.57	7500	0.5712	0.3239
0.3437	1.67	8000	0.5729	0.3202
0.352	1.78	8500	0.5465	0.3100
0.34	1.88	9000	0.5418	0.3059
0.4086	1.99	9500	0.5189	0.3053
0.2968	2.09	10000	0.5373	0.3076
0.2968	2.2	10500	0.5602	0.3061
0.2956	2.3	11000	0.5651	0.3051
0.2863	2.41	11500	0.5476	0.2982
0.2852	2.51	12000	0.5579	0.2954
0.292	2.61	12500	0.5451	0.2953
0.2877	2.72	13000	0.5468	0.2905
0.285	2.82	13500	0.5283	0.2908
0.2872	2.93	14000	0.5240	0.2867
0.3286	3.03	14500	0.5078	0.2846
0.2526	3.14	15000	0.5373	0.2836
0.2494	3.24	15500	0.5566	0.2861
0.2534	3.35	16000	0.5378	0.2859
0.2435	3.45	16500	0.5225	0.2813
0.3144	3.56	17000	0.5203	0.2808
0.2501	3.66	17500	0.5176	0.2785
0.2469	3.76	18000	0.5022	0.2795
0.242	3.87	18500	0.5228	0.2757
0.242	3.97	19000	0.5024	0.2788
0.2205	4.08	19500	0.5318	0.2729
0.2149	4.18	20000	0.5492	0.2763
0.2186	4.29	20500	0.5599	0.2769
0.2191	4.39	21000	0.5493	0.2695
0.218	4.5	21500	0.5385	0.2709
0.2046	4.6	22000	0.5326	0.2718
0.2064	4.71	22500	0.5591	0.2725
0.2066	4.81	23000	0.5283	0.2700
0.2102	4.92	23500	0.5456	0.2713
0.3345	5.02	24000	0.5474	0.2698
0.1891	5.12	24500	0.5466	0.2672
0.1954	5.23	25000	0.5691	0.2731
0.1971	5.33	25500	0.5595	0.2741
0.1995	5.44	26000	0.5609	0.2716
0.1911	5.54	26500	0.5513	0.2684
0.1954	5.65	27000	0.5282	0.2683
0.193	5.75	27500	0.5460	0.2644
0.1974	5.86	28000	0.5415	0.2650
0.1947	5.96	28500	0.5227	0.2656
0.1836	6.07	29000	0.5361	0.2743
0.1741	6.17	29500	0.5637	0.2649
0.1776	6.27	30000	0.5705	0.2680
0.1747	6.38	30500	0.5587	0.2667
0.1761	6.48	31000	0.5480	0.2683
0.1715	6.59	31500	0.5547	0.2627
0.2424	6.69	32000	0.5254	0.2610
0.1756	6.8	32500	0.5301	0.2633
0.1761	6.9	33000	0.5267	0.2658
0.1751	7.01	33500	0.5611	0.2677
0.1653	7.11	34000	0.5617	0.2663
0.1591	7.22	34500	0.5435	0.2642
0.1559	7.32	35000	0.5608	0.2611
0.1604	7.43	35500	0.5477	0.2611
0.162	7.53	36000	0.5257	0.2559
0.1579	7.63	36500	0.5398	0.2570
0.162	7.74	37000	0.5566	0.2605
0.2351	7.84	37500	0.5371	0.2564
0.1566	7.95	38000	0.5507	0.2565
0.1515	8.05	38500	0.5640	0.2544
0.1459	8.16	39000	0.5739	0.2523
0.1463	8.26	39500	0.5596	0.2522
0.1466	8.37	40000	0.5522	0.2537
0.2372	8.47	40500	0.5567	0.2520
0.1488	8.58	41000	0.5546	0.2506
0.1492	8.68	41500	0.5533	0.2518
0.1454	8.78	42000	0.5488	0.2508
0.148	8.89	42500	0.5635	0.2526
0.1424	8.99	43000	0.5513	0.2509
0.1356	9.1	43500	0.5534	0.2527
0.1346	9.2	44000	0.5735	0.2497
0.1346	9.31	44500	0.5710	0.2489
0.1401	9.41	45000	0.5561	0.2491
0.2212	9.52	45500	0.5564	0.2482
0.1369	9.62	46000	0.5658	0.2484
0.1323	9.73	46500	0.5582	0.2495
0.1369	9.83	47000	0.5560	0.2503
0.1368	9.94	47500	0.5552	0.2489
0.1333	10.04	48000	0.5953	0.2491
0.1305	10.14	48500	0.5818	0.2520
0.1316	10.25	49000	0.5773	0.2506
0.1334	10.35	49500	0.5882	0.2485
0.1351	10.46	50000	0.5750	0.2483
0.1337	10.56	50500	0.5910	0.2486
0.2241	10.67	51000	0.5732	0.2491
0.1327	10.77	51500	0.5839	0.2493
0.1364	10.88	52000	0.5724	0.2464
0.1305	10.98	52500	0.5758	0.2468
0.128	11.09	53000	0.5811	0.2482
0.1267	11.19	53500	0.5903	0.2483
0.1262	11.29	54000	0.5792	0.2483
0.1291	11.4	54500	0.5735	0.2497
0.1228	11.5	55000	0.5920	0.2494
0.1249	11.61	55500	0.5907	0.2488
0.1266	11.71	56000	0.5786	0.2486
0.1235	11.82	56500	0.5790	0.2473
0.1243	11.92	57000	0.5787	0.2470

Framework versions

Transformers 4.19.2
Pytorch 1.11.0+cu113
Datasets 2.2.0
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご