Open-source wav2vec2-speechdat model - A highly practical Swedish automatic speech recognition tool

Wav2vec2 Speechdat

Developed by birgermoell

This model is a Swedish automatic speech recognition model fine-tuned on the COMMON_VOICE - SV-SE dataset based on facebook/wav2vec2-large-xlsr-53.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Swedish speech recognition #High-precision WER #Multi-dialect adaptation

Downloads 29

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model for Swedish, based on the wav2vec2 architecture and fine-tuned on the Common Voice Swedish dataset.

Model Features

Swedish optimization

Specially fine-tuned for Swedish, performing well on Swedish speech recognition tasks

Based on wav2vec2 architecture

Uses facebook's wav2vec2-large-xlsr-53 as the base model, with powerful speech feature extraction capabilities

Trained on Common Voice dataset

Trained using the high-quality Common Voice Swedish dataset

Model Capabilities

Swedish speech recognition

Speech-to-text

Use Cases

Speech transcription

Swedish speech transcription

Convert Swedish speech content to text

Achieved a word error rate (WER) of 0.2927 on the evaluation set

Voice assistant

Swedish voice command recognition

Used for command recognition in Swedish voice assistant systems

🚀 wav2vec2-speechdat

This is a fine-tuned model for automatic speech recognition. It is based on the facebook/wav2vec2-large-xlsr-53 model and trained on the COMMON_VOICE - SV-SE dataset, achieving good results on the evaluation set.

🚀 Quick Start

This model can be used directly for Swedish automatic speech recognition tasks. You can load the model through the Hugging Face Transformers library and perform inference.

✨ Features

Fine-tuned on Swedish data: Trained on the COMMON_VOICE - SV-SE dataset, it has better performance on Swedish speech recognition.
Good evaluation results: Achieves a loss of 0.4578 and a WER of 0.2927 on the evaluation set.

📚 Documentation

Model Information

Property	Details
Model Type	Fine-tuned version of facebook/wav2vec2-large-xlsr-53
Training Data	COMMON_VOICE - SV-SE dataset

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 15.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.01	100	3.6252	1.0
No log	0.02	200	3.1906	1.0
No log	0.03	300	3.1090	1.0
No log	0.04	400	1.8796	0.9955
6.2575	0.05	500	1.3515	0.9058
6.2575	0.06	600	1.1209	0.8328
6.2575	0.07	700	1.1404	0.8309
6.2575	0.09	800	1.0599	0.8021
6.2575	0.1	900	0.9901	0.8335
0.7737	0.11	1000	0.8846	0.7400
0.7737	0.12	1100	0.9971	0.7820
0.7737	0.13	1200	0.8665	0.7123
0.7737	0.14	1300	0.8490	0.7366
0.7737	0.15	1400	0.8250	0.6765
0.6183	0.16	1500	0.8291	0.6965
0.6183	0.17	1600	0.7946	0.6823
0.6183	0.18	1700	0.8239	0.6894
0.6183	0.19	1800	0.8282	0.6796
0.6183	0.2	1900	0.7645	0.6518
0.561	0.21	2000	0.7530	0.6367
0.561	0.22	2100	0.7296	0.6177
0.561	0.24	2200	0.7527	0.6498
0.561	0.25	2300	0.7210	0.6316
0.561	0.26	2400	0.7938	0.6757
0.5402	0.27	2500	0.7485	0.6372
0.5402	0.28	2600	0.7146	0.6133
0.5402	0.29	2700	0.7308	0.6626
0.5402	0.3	2800	0.7078	0.5949
0.5402	0.31	2900	0.7679	0.6373
0.5303	0.32	3000	0.7263	0.6502
0.5303	0.33	3100	0.6613	0.5846
0.5303	0.34	3200	0.6784	0.5783
0.5303	0.35	3300	0.6908	0.5833
0.5303	0.36	3400	0.6595	0.5826
0.503	0.37	3500	0.6717	0.5938
0.503	0.39	3600	0.6938	0.5791
0.503	0.4	3700	0.6677	0.6052
0.503	0.41	3800	0.6544	0.5554
0.503	0.42	3900	0.6514	0.5728
0.4959	0.43	4000	0.6847	0.6188
0.4959	0.44	4100	0.6626	0.5869
0.4959	0.45	4200	0.6670	0.5700
0.4959	0.46	4300	0.6596	0.5846
0.4959	0.47	4400	0.6523	0.5468
0.4824	0.48	4500	0.6392	0.5688
0.4824	0.49	4600	0.6561	0.5687
0.4824	0.5	4700	0.6697	0.5817
0.4824	0.51	4800	0.6348	0.5608
0.4824	0.52	4900	0.6561	0.5600
0.4714	0.54	5000	0.6522	0.6181
0.4714	0.55	5100	0.6858	0.5921
0.4714	0.56	5200	0.6706	0.5497
0.4714	0.57	5300	0.7123	0.5768
0.4714	0.58	5400	0.6599	0.6100
0.471	0.59	5500	0.6421	0.5626
0.471	0.6	5600	0.6395	0.5753
0.471	0.61	5700	0.6788	0.5481
0.471	0.62	5800	0.6386	0.5516
0.471	0.63	5900	0.6694	0.5913
0.4707	0.64	6000	0.6251	0.5699
0.4707	0.65	6100	0.6243	0.5567
0.4707	0.66	6200	0.6645	0.5629
0.4707	0.67	6300	0.6296	0.5895
0.4707	0.69	6400	0.6078	0.5183
0.4632	0.7	6500	0.6270	0.5619
0.4632	0.71	6600	0.6050	0.5336
0.4632	0.72	6700	0.6185	0.5449
0.4632	0.73	6800	0.6281	0.5645
0.4632	0.74	6900	0.5877	0.5084
0.4514	0.75	7000	0.6199	0.5403
0.4514	0.76	7100	0.6293	0.5275
0.4514	0.77	7200	0.6290	0.5447
0.4514	0.78	7300	0.6130	0.5373
0.4514	0.79	7400	0.6138	0.5285
0.4457	0.8	7500	0.6040	0.5259
0.4457	0.81	7600	0.6220	0.5686
0.4457	0.82	7700	0.5915	0.5164
0.4457	0.84	7800	0.6270	0.5289
0.4457	0.85	7900	0.6224	0.5515
0.4458	0.86	8000	0.6161	0.5323
0.4458	0.87	8100	0.5827	0.5122
0.4458	0.88	8200	0.6067	0.5202
0.4458	0.89	8300	0.6087	0.5192
0.4458	0.9	8400	0.6859	0.5796
0.4409	0.91	8500	0.6180	0.5131
0.4409	0.92	8600	0.5945	0.4948
0.4409	0.93	8700	0.5967	0.5532
0.4409	0.94	8800	0.5770	0.4961
0.4409	0.95	8900	0.5809	0.5203
0.4305	0.96	9000	0.5805	0.5039
0.4305	0.97	9100	0.5873	0.5188
0.4305	0.98	9200	0.6277	0.5516
0.4305	1.0	9300	0.5727	0.5052
0.4305	1.01	9400	0.5858	0.5123
0.4264	1.02	9500	0.5692	0.4968
0.4264	1.03	9600	0.5954	0.5117
0.4264	1.04	9700	0.5904	0.5076
0.4264	1.05	9800	0.6046	0.5101
0.4264	1.06	9900	0.5616	0.4926
0.4176	1.07	10000	0.5971	0.5368
0.4176	1.08	10100	0.5706	0.4940
0.4176	1.09	10200	0.5612	0.5032
0.4176	1.1	10300	0.5672	0.4944
0.4176	1.11	10400	0.5915	0.5218
0.4033	1.12	10500	0.5706	0.5051
0.4033	1.13	10600	0.5661	0.4934
0.4033	1.15	10700	0.5724	0.4903
0.4033	1.16	10800	0.5792	0.4940
0.4033	1.17	10900	0.5744	0.4911
0.392	1.18	11000	0.5767	0.5162
0.392	1.19	11100	0.5588	0.4835
0.392	1.2	11200	0.5609	0.4922
0.392	1.21	11300	0.5890	0.4914
0.392	1.22	11400	0.5525	0.4897
0.387	1.23	11500	0.5704	0.5051
0.387	1.24	11600	0.5539	0.5014
0.387	1.25	11700	0.5473	0.4882
0.387	1.26	11800	0.5662	0.5004
0.387	1.27	11900	0.5785	0.5220
0.3956	1.28	12000	0.5990	0.5114
0.3956	1.3	12100	0.5497	0.4895
0.3956	1.31	12200	0.5538	0.4895
0.3956	1.32	12300	0.5652	0.4913
0.3956	1.33	12400	0.5682	0.5128
0.4043	1.34	12500	0.5830	0.4999
0.4043	1.35	12600	0.5686	0.4865
0.4043	1.36	12700	0.5688	0.4937
0.4043	1.37	12800	0.5753	0.5034
0.4043	1.38	12900	0.5898	0.4865
0.3997	1.39	13000	0.5723	0.4963
0.3997	1.4	13100	0.5767	0.4986
0.3997	1.41	13200	0.5960	0.5084
0.3997	1.42	13300	0.5859	0.5096
0.3997	1.43	13400	0.5491	0.4784
0.3997	1.45	13500	0.5636	0.5049
0.3997	1.46	13600	0.5667	0.4708
0.3997	1.47	13700	0.5757	0.4862
0.3997	1.48	13800	0.5444	0.4816
0.3997	1.49	13900	0.5557	0.4792
0.3954	1.5	14000	0.5437	0.4810
0.3954	1.51	14100	0.5489	0.4674
0.3954	1.52	14200	0.5415	0.4674
0.3954	1.53	14300	0.5481	0.4902
0.3954	1.54	14400	0.5474	0.4763
0.3814	1.55	14500	0.5588	0.4731
0.3814	1.56	14600	0.5746	0.4820
0.3814	1.57	14700	0.5676	0.4884
0.3814	1.58	14800	0.5495	0.4711
0.3814	1.6	14900	0.5565	0.4782
0.3877	1.61	15000	0.5671	0.5135
0.3877	1.62	15100	0.5512	0.4868
0.3877	1.63	15200	0.5683	0.4650
0.3877	1.64	15300	0.5427	0.4717
0.3877	1.65	15400	0.5519	0.4651
0.387	1.66	15500	0.5327	0.4456
0.387	1.67	15600	0.5371	0.4673
0.387	1.68	15700	0.5337	0.4705
0.387	1.69	15800	0.5606	0.4992
0.387	1.7	15900	0.5254	0.4613
0.3877	1.71	16000	0.5619	0.4882
0.3877	1.72	16100	0.5212	0.4560
0.3877	1.73	16200	0.5369	0.4696
0.3877	1.75	16300	0.5392	0.4677
0.3877	1.76	16400	0.5353	0.4768
0.3739	1.77	16500	0.5435	0.4777
0.3739	1.78	16600	0.5343	0.4884
0.3739	1.79	16700	0.5309	0.4942
0.3739	1.8	16800	0.5373	0.4727
0.3739	1.81	16900	0.5550	0.4686
0.3884	1.82	17000	0.5486	0.4826
0.3884	1.83	17100	0.5508	0.4862
0.3884	1.84	17200	0.5423	0.4855
0.3884	1.85	17300	0.5478	0.4730
0.3884	1.86	17400	0.5438	0.4938
0.3842	1.87	17500	0.5571	0.4818
0.3842	1.88	17600	0.5402	0.4753
0.3842	1.9	17700	0.5679	0.4827
0.3842	1.91	17800	0.5385	0.4642
0.3842	1.92	17900	0.5519	0.4942
0.3953	1.93	18000	0.5559	0.4745
0.3953	1.94	18100	0.5657	0.4963
0.3953	1.95	18200	0.5296	0.4642
0.3953	1.96	18300	0.5529	0.4907
0.3953	1.97	18400	0.5380	0.4536
0.3745	1.98	18500	0.5276	0.4678
0.3745	1.99	18600	0.5544	0.4854
0.3745	2.0	18700	0.5195	0.4535
0.3745	2.01	18800	0.5494	0.4740
0.3745	2.02	18900	0.5359	0.4673
0.3745	2.03	19000	0.5312	0.4568
0.3745	2.04	19100	0.5397	0.4626
0.3745	2.05	19200	0.5331	0.4697
0.3745	2.06	19300	0.5288	0.4609
0.3745	2.07	19400	0.5361	0.4639
0.3745	2.08	19500	0.5233	0.4587
0.3745	2.09	19600	0.5303	0.4670
0.3745	2.1	19700	0.5224	0.4539
0.3745	2.11	19800	0.5327	0.4618
0.3745	2.12	19900	0.5263	0.4562
0.3745	2.13	20000	0.5222	0.4530

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご