Open-source Automatic Speech Recognition Model wav2vec2-bert-CV16-en - Free and Precise English Speech-to-Text Conversion

Home

Wav2vec2 Bert CV16 En

Developed by hf-audio

An automatic speech recognition (ASR) model fine-tuned on the Common Voice 16.0 English dataset based on w2v-bert-2.0

Speech Recognition

Transformers

English#English Speech Recognition #Low Word Error Rate #BERT Enhanced

Downloads 1,700

Release Time : 1/5/2024

Model Overview

This model is an automatic speech recognition system for English, fine-tuned on the Common Voice 16.0 English dataset, capable of converting English speech into text

Model Features

Efficient Speech Recognition

Fine-tuned on the Common Voice 16.0 English dataset with high recognition accuracy

Low Word Error Rate

Achieves a word error rate (WER) of 14.55% and a character error rate (CER) of 5.8% on the evaluation set

Multi-GPU Training Optimization

Supports distributed training across multiple GPUs using the Adam optimizer and linear learning rate scheduling

Model Capabilities

English Speech Recognition

Speech-to-Text

Automatic Speech Transcription

Use Cases

Speech Transcription

Voice Memo Transcription

Automatically converts English voice memos into text

Approximately 85.45% accuracy (1-WER)

Meeting Minutes Automation

Automatically generates text records of English meetings

Assistive Technology

Real-time Caption Generation

Generates real-time captions for English video content

🚀 wav2vec2-bert-CV16-en

This model is a fine - tuned version of ylacombe/w2v-bert-2.0 on the MOZILLA - FOUNDATION/COMMON_VOICE_16_0 - EN dataset. It's designed for automatic speech recognition, aiming to provide more accurate speech - to - text conversion.

🚀 Quick Start

This model can be used for automatic speech recognition tasks. You can fine - tune it on your own dataset or use it directly for inference.

✨ Features

Fine - tuned: Based on the pre - trained model ylacombe/w2v-bert-2.0, it's fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_16_0 - EN dataset.
Performance Metrics: Achieves good results on evaluation set, with Loss: 0.2427, Wer: 0.1455, and Cer: 0.0580.

📦 Installation

The installation steps are not provided in the original README. You may need to refer to the official documentation of the relevant framework (e.g., Hugging Face Transformers) to install the necessary libraries for using this model.

📚 Documentation

Model description

This model is a fine - tuned version of ylacombe/w2v-bert-2.0 on the MOZILLA - FOUNDATION/COMMON_VOICE_16_0 - EN dataset.

Intended uses & limitations

More information needed.

Training and evaluation data

More information needed.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 12
eval_batch_size: 12
seed: 42
distributed_type: multi - GPU
num_devices: 3
total_train_batch_size: 36
total_eval_batch_size: 36
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10000
num_epochs: 3.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
2.9554	0.01	250	3.1731	0.9999	0.9942
2.7058	0.02	500	2.6717	1.0307	0.7486
0.9641	0.02	750	0.9895	0.6091	0.2035
0.6935	0.03	1000	0.7740	0.4821	0.1562
0.617	0.04	1250	0.6751	0.4008	0.1303
0.4826	0.05	1500	0.5920	0.3499	0.1170
0.4252	0.06	1750	0.5659	0.3056	0.1053
0.472	0.07	2000	0.5066	0.2869	0.1007
0.4042	0.07	2250	0.4604	0.2662	0.0950
0.4279	0.08	2500	0.5165	0.2587	0.0948
0.3586	0.09	2750	0.4440	0.2461	0.0895
0.2715	0.1	3000	0.5096	0.2468	0.0904
0.413	0.11	3250	0.4416	0.2350	0.0879
0.3142	0.11	3500	0.4591	0.2280	0.0856
0.286	0.12	3750	0.4529	0.2284	0.0860
0.3112	0.13	4000	0.4621	0.2320	0.0875
0.3294	0.14	4250	0.4528	0.2294	0.0862
0.3522	0.15	4500	0.4279	0.2287	0.0861
0.2977	0.15	4750	0.4403	0.2200	0.0830
0.2391	0.16	5000	0.4360	0.2161	0.0831
0.3025	0.17	5250	0.4214	0.2157	0.0831
0.309	0.18	5500	0.4060	0.2125	0.0818
0.2872	0.19	5750	0.4233	0.2189	0.0824
0.2796	0.2	6000	0.4055	0.2151	0.0823
0.2609	0.2	6250	0.4374	0.2194	0.0853
0.283	0.21	6500	0.4288	0.2215	0.0877
0.3028	0.22	6750	0.4180	0.2166	0.0837
0.2565	0.23	7000	0.4476	0.2268	0.0892
0.2824	0.24	7250	0.4057	0.2195	0.0850
0.325	0.24	7500	0.3926	0.2157	0.0849
0.336	0.25	7750	0.4469	0.2208	0.0879
0.304	0.26	8000	0.4292	0.2245	0.0886
0.2457	0.27	8250	0.4198	0.2204	0.0856
0.2768	0.28	8500	0.4330	0.2184	0.0859
0.2165	0.29	8750	0.4276	0.2173	0.0864
0.3015	0.29	9000	0.4255	0.2223	0.0882
0.308	0.3	9250	0.4356	0.2318	0.0925
0.2981	0.31	9500	0.4514	0.2226	0.0884
0.2944	0.32	9750	0.4182	0.2293	0.0901
0.3298	0.33	10000	0.4290	0.2275	0.0892
0.2523	0.33	10250	0.4032	0.2191	0.0865
0.2887	0.34	10500	0.4218	0.2284	0.0917
0.3156	0.35	10750	0.3930	0.2271	0.0898
0.2526	0.36	11000	0.4367	0.2304	0.0928
0.2561	0.37	11250	0.4261	0.2279	0.0916
0.2291	0.37	11500	0.4401	0.2231	0.0899
0.2521	0.38	11750	0.4101	0.2232	0.0895
0.2249	0.39	12000	0.4021	0.2270	0.0913
0.2917	0.4	12250	0.4124	0.2267	0.0915
0.2436	0.41	12500	0.4197	0.2257	0.0903
0.2976	0.42	12750	0.3951	0.2230	0.0896
0.2333	0.42	13000	0.4099	0.2250	0.0901
0.2261	0.43	13250	0.4328	0.2168	0.0876
0.2514	0.44	13500	0.3947	0.2208	0.0895
0.296	0.45	13750	0.3953	0.2149	0.0859
0.2426	0.46	14000	0.3831	0.2119	0.0852
0.2258	0.46	14250	0.4060	0.2263	0.0915
0.2565	0.47	14500	0.4057	0.2237	0.0901
0.2834	0.48	14750	0.4112	0.2167	0.0876
0.234	0.49	15000	0.3802	0.2133	0.0852
0.3084	0.5	15250	0.3837	0.2151	0.0871
0.3051	0.51	15500	0.3848	0.2145	0.0867
0.2364	0.51	15750	0.3817	0.2134	0.0870
0.2345	0.52	16000	0.3883	0.2163	0.0874
0.2235	0.53	16250	0.3740	0.2136	0.0869
0.2365	0.54	16500	0.3711	0.2112	0.0850
0.2449	0.55	16750	0.3805	0.2127	0.0858
0.2569	0.55	17000	0.3794	0.2124	0.0863
0.2273	0.56	17250	0.3922	0.2207	0.0895
0.2492	0.57	17500	0.3670	0.2195	0.0874
0.236	0.58	17750	0.3799	0.2120	0.0862
0.2823	0.59	18000	0.3734	0.2144	0.0867
0.2349	0.59	18250	0.3972	0.2175	0.0889
0.2156	0.6	18500	0.3729	0.2157	0.0867
0.2812	0.61	18750	0.3905	0.2117	0.0854
0.242	0.62	19000	0.3912	0.2114	0.0855
0.2237	0.63	19250	0.3794	0.2155	0.0877
0.255	0.64	19500	0.3770	0.2079	0.0840
0.1899	0.64	19750	0.3796	0.2145	0.0868
0.2793	0.65	20000	0.3784	0.2145	0.0863
0.2099	0.66	20250	0.3956	0.2161	0.0875
0.22	0.67	20500	0.3804	0.2135	0.0875
0.2213	0.68	20750	0.3803	0.2100	0.0849
0.245	0.68	21000	0.3783	0.2142	0.0870
0.2188	0.69	21250	0.3873	0.2163	0.0861
0.2613	0.7	21500	0.3646	0.2105	0.0844
0.1907	0.71	21750	0.3830	0.2101	0.0853
0.2095	0.72	22000	0.3794	0.2087	0.0849
0.2319	0.73	22250	0.3548	0.2087	0.0842
0.2049	0.73	22500	0.3782	0.2075	0.0837
0.2248	0.74	22750	0.3736	0.2100	0.0845
0.2277	0.75	23000	0.3712	0.2105	0.0845
0.2115	0.76	23250	0.3722	0.2124	0.0859
0.2001	0.77	23500	0.3602	0.2072	0.0832
0.2095	0.77	23750	0.3607	0.2106	0.0851
0.2286	0.78	24000	0.3810	0.2132	0.0876
0.2284	0.79	24250	0.3677	0.2066	0.0847
0.2003	0.8	24500	0.3650	0.2098	0.0847
0.1992	0.81	24750	0.3491	0.2019	0.0813
0.224	0.81	25000	0.3602	0.2043	0.0825
0.2181	0.82	25250	0.3712	0.2120	0.0867
0.2226	0.83	25500	0.3657	0.2028	0.0830
0.1912	0.84	25750	0.3662	0.2076	0.0846
0.2283	0.85	26000	0.3505	0.2049	0.0825
0.2068	0.86	26250	0.3622	0.2111	0.0852
0.2444	0.86	26500	0.3660	0.2055	0.0840
0.2055	0.87	26750	0.3625	0.2055	0.0830
0.2074	0.88	27000	0.3566	0.1981	0.0812
0.2019	0.89	27250	0.3537	0.2038	0.0822
0.2174	0.9	27500	0.3664	0.1990	0.0809
0.2009	0.9	27750	0.3512	0.2035	0.0821
0.211	0.91	28000	0.3707	0.2068	0.0846
0.2541	0.92	28250	0.3435	0.1992	0.0812
0.2108	0.93	28500	0.3461	0.2046	0.0828
0.2274	0.94	28750	0.3364	0.1998	0.0812
0.2175	0.95	29000	0.3742	0.2113	0.0864
0.2368	0.95	29250	0.3431	0.2051	0.0833
0.1831	0.96	29500	0.3468	0.2034	0.0825
0.2202	0.97	29750	0.3342	0.1964	0.0791
0.183	0.98	30000	0.3413	0.1966	0.0792
0.1958	0.99	30250	0.3466	0.1991	0.0809
0.2167	0.99	30500	0.3530	0.2024	0.0816
0.2057	1.0	30750	0.3334	0.1960	0.0788
0.1982	1.01	31000	0.3312	0.1951	0.0789
0.2123	1.02	31250	0.3285	0.1955	0.0785
0.2269	1.03	31500	0.3548	0.2034	0.0812
0.2056	1.03	31750	0.3433	0.1969	0.0793
0.2234	1.04	32000	0.3446	0.1981	0.0805
0.1913	1.05	32250	0.3465	0.1969	0.0792
0.2005	1.06	32500	0.3348	0.1947	0.0784
0.2017	1.07	32750	0.3567	0.1972	0.0796
0.2523	1.08	33000	0.3367	0.1971	0.0801
0.1716	1.08	33250	0.3476	0.1975	0.0799
0.168	1.09	33500	0.3346	0.1951	0.0790
0.1995	1.1	33750	0.3564	0.1971	0.0794
0.198	1.11	34000	0.3409	0.1988	0.0796
0.1801	1.12	34250	0.3303	0.1995	0.0798
0.181	1.12	34500	0.3363	0.1967	0.0794
0.1966	1.13	34750	0.3375	0.1947	0.0784
0.2163	1.14	35000	0.3441	0.2011	0.0810
0.2285	1.15	35250	0.3303	0.1972	0.0801
0.1814	1.16	35500	0.3462	0.1895	0.0772
0.2127	1.17	35750	0.3393	0.1904	0.0775
0.1795	1.17	36000	0.3374	0.1928	0.0780
0.2062	1.18	36250	0.3286	0.1929	0.0783
0.172	1.19	36500	0.3334	0.1929	0.0781
0.1534	1.2	36750	0.3287	0.1895	0.0763
0.2101	1.21	37000	0.3261	0.1888	0.0764
0.2342	1.21	37250	0.3413	0.2007	0.0812
0.1692	1.22	37500	0.3375	0.1932	0.0780
0.165	1.23	37750	0.3220	0.1903	0.0767
0.2067	1.24	38000	0.3212	0.1855	0.0754
0.1984	1.25	38250	0.3339	0.1890	0.0762
0.2117	1.25	38500	0.3224	0.1900	0.0761
0.2036	1.26	38750	0.3410	0.1923	0.0790
0.2072	1.27	39000	0.3291	0.1904	0.0770
0.1962	1.28	39250	0.3237	0.1908	0.0770
0.2055	1.29	39500	0.3260	0.1896	0.0767
0.1753	1.3	39750	0.3375	0.1915	0.0777
0.1983	1.3	40000	0.3236	0.1850	0.0750
0.173	1.31	40250	0.3253	0.1870	0.0754
0.1773	1.32	40500	0.3316	0.1923	0.0766
0.1649	1.33	40750	0.3218	0.1842	0.0749
0.1806	1.34	41000	0.3161	0.1907	0.0769
0.1639	... (The table continues)	...	...	...	...

🔧 Technical Details

The model is based on the pre - trained model ylacombe/w2v-bert-2.0 and fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_16_0 - EN dataset. The training process uses a series of hyperparameters to optimize the model's performance.

📄 License

The license information is not provided in the original README. You may need to check the official repository for relevant license details.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご