wav2vec2-base-MIR_ST500_ASR_109 Open-source Automatic Speech Recognition Model

Home

Wav2vec2 Base MIR ST500 ASR 109

Developed by gary109

A fine-tuned automatic speech recognition model based on facebook/wav2vec2-base on the MIR_ST500 dataset

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech-to-text #Multi-GPU training #Low word error rate

Downloads 15

Release Time : 4/15/2022

Model Overview

This model is a fine-tuned version for automatic speech recognition (ASR) tasks, trained on the MIR_ST500 dataset, capable of converting speech to text.

Model Features

Based on wav2vec2 architecture

Uses facebook's wav2vec2-base as the foundational architecture with excellent speech feature extraction capabilities

Domain-specific fine-tuning

Fine-tuned on the MIR_ST500 dataset, potentially optimized for specific domains or accents

Multi-GPU training

Utilizes 2 GPUs for distributed training, improving training efficiency

Model Capabilities

Speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Meeting minutes

Automatically convert meeting recordings into written transcripts

Voice notes

Convert voice memos into searchable text

🚀 wav2vec2-base-MIR_ST500_ASR_109

This model is a fine - tuned version of facebook/wav2vec2-base on the /WORKSPACE/DATASETS/DATASETS/MIR_ST500/MIR_ST500.PY - ASR dataset. It can achieve the following results on the evaluation set:

Loss: 0.6452
Wer: 0.3732

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi - GPU
num_devices: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
12.5751	0.27	100	6.0291	1.0
4.343	0.53	200	2.8709	1.0
4.1911	0.8	300	2.5472	1.0
2.4535	1.06	400	2.4323	1.0
2.6157	1.33	500	2.2799	1.0
2.4839	1.6	600	2.2722	1.0
2.2787	1.86	700	2.2269	1.0
2.1981	2.13	800	2.2221	1.0
2.159	2.39	900	2.1657	1.0
2.1421	2.66	1000	2.1769	1.0
2.0841	2.93	1100	2.1688	1.0
2.0599	3.19	1200	2.1141	1.0
2.0257	3.46	1300	2.0445	1.0
1.979	3.72	1400	2.0180	1.0
1.9366	3.99	1500	1.9419	1.0
1.8547	4.26	1600	1.8765	1.0
1.3988	4.52	1700	1.4151	0.7999
1.1881	4.79	1800	1.1158	0.7347
0.9557	5.05	1900	1.0095	0.6485
0.9087	5.32	2000	0.9644	0.6848
0.8086	5.59	2100	0.8960	0.6119
0.9106	5.85	2200	0.8892	0.5941
0.8252	6.12	2300	0.8333	0.5756
0.8299	6.38	2400	0.8559	0.5838
0.8021	6.65	2500	0.8201	0.5883
0.7979	6.91	2600	0.8349	0.575
0.7223	7.18	2700	0.7883	0.5563
0.6754	7.45	2800	0.7590	0.5393
0.6454	7.71	2900	0.7411	0.5291
0.6228	7.98	3000	0.7464	0.5300
0.6475	8.24	3100	0.7478	0.5295
0.6452	8.51	3200	0.7555	0.5360
0.5636	8.78	3300	0.7369	0.5232
0.564	9.04	3400	0.7331	0.5076
0.6173	9.31	3500	0.7199	0.5034
0.625	9.57	3600	0.7243	0.5193
0.8122	9.84	3700	0.7436	0.5242
0.5455	10.11	3800	0.7111	0.4920
0.7928	10.37	3900	0.7137	0.4858
0.5446	10.64	4000	0.6874	0.4828
0.4772	10.9	4100	0.6760	0.4801
0.6447	11.17	4200	0.6893	0.4886
0.5818	11.44	4300	0.6789	0.4740
0.4952	11.7	4400	0.7043	0.4811
0.5722	11.97	4500	0.6794	0.4766
0.58	12.23	4600	0.6629	0.4580
0.5432	12.5	4700	0.6907	0.4906
0.4786	12.77	4800	0.6925	0.4854
0.5177	13.03	4900	0.6666	0.4532
0.5448	13.3	5000	0.6744	0.4542
0.5732	13.56	5100	0.6930	0.4986
0.5065	13.83	5200	0.6647	0.4351
0.4005	14.1	5300	0.6659	0.4508
0.4256	14.36	5400	0.6682	0.4533
0.4459	14.63	5500	0.6594	0.4326
0.4645	14.89	5600	0.6615	0.4287
0.4275	15.16	5700	0.6423	0.4299
0.4026	15.43	5800	0.6539	0.4217
0.3507	15.69	5900	0.6555	0.4299
0.3998	15.96	6000	0.6526	0.4213
0.4462	16.22	6100	0.6469	0.4230
0.4095	16.49	6200	0.6516	0.4210
0.4452	16.76	6300	0.6373	0.4133
0.3997	17.02	6400	0.6456	0.4211
0.3826	17.29	6500	0.6278	0.4042
0.3867	17.55	6600	0.6459	0.4112
0.4367	17.82	6700	0.6464	0.4131
0.3887	18.09	6800	0.6567	0.4150
0.3481	18.35	6900	0.6548	0.4145
0.4241	18.62	7000	0.6490	0.4123
0.3742	18.88	7100	0.6561	0.4135
0.423	19.15	7200	0.6498	0.4051
0.3803	19.41	7300	0.6475	0.3903
0.3084	19.68	7400	0.6403	0.4042
0.3012	19.95	7500	0.6460	0.4004
0.3306	20.21	7600	0.6491	0.3837
0.3612	20.48	7700	0.6752	0.3884
0.3572	20.74	7800	0.6383	0.3793
0.3638	21.01	7900	0.6349	0.3838
0.3658	21.28	8000	0.6544	0.3793
0.3726	21.54	8100	0.6567	0.3756
0.3618	21.81	8200	0.6390	0.3795
0.3212	22.07	8300	0.6359	0.3768
0.3561	22.34	8400	0.6452	0.3732
0.3231	22.61	8500	0.6416	0.3731
0.3764	22.87	8600	0.6428	0.3697
0.4142	23.14	8700	0.6415	0.3665
0.2713	23.4	8800	0.6541	0.3676
0.2277	23.67	8900	0.6492	0.3684
0.3849	23.94	9000	0.6448	0.3651
0.266	24.2	9100	0.6602	0.3643
0.3464	24.47	9200	0.6673	0.3607
0.2919	24.73	9300	0.6557	0.3677
0.2878	25.0	9400	0.6377	0.3653
0.1603	25.27	9500	0.6598	0.3700
0.2055	25.53	9600	0.6558	0.3614
0.1508	25.8	9700	0.6543	0.3605
0.3162	26.06	9800	0.6570	0.3576
0.2613	26.33	9900	0.6604	0.3584
0.2244	26.6	10000	0.6618	0.3634
0.1585	26.86	10100	0.6698	0.3634
0.2959	27.13	10200	0.6709	0.3593
0.2778	27.39	10300	0.6638	0.3537
0.2354	27.66	10400	0.6770	0.3585
0.2992	27.93	10500	0.6698	0.3506
0.2664	28.19	10600	0.6725	0.3533
0.2582	28.46	10700	0.6689	0.3542
0.2096	28.72	10800	0.6731	0.3527
0.4169	28.99	10900	0.6691	0.3521
0.2716	29.26	11000	0.6712	0.3517
0.2944	29.52	11100	0.6708	0.3509
0.2737	29.79	11200	0.6699	0.3491

Framework versions

Transformers 4.18.0
Pytorch 1.9.1+cu102
Datasets 2.0.0
Tokenizers 0.11.6

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご