wav2vec2-large-xls-r-300m-spanish-custom Open-source Speech Recognition Model

Wav2vec2 Large Xls R 300m Spanish Custom

Developed by tomascufaro

This is a speech recognition model fine-tuned on the Common Voice Spanish dataset based on the facebook/wav2vec2-xls-r-300m model, achieving a word error rate of 21.17% on the evaluation set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Spanish speech recognition #Large model fine-tuning #Low word error rate

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is an optimized automatic speech recognition (ASR) model for Spanish, capable of converting Spanish speech into text.

Model Features

Optimized for Spanish

Specifically fine-tuned on Spanish speech data, improving the accuracy of Spanish recognition.

Based on wav2vec2-xls-r architecture

Utilizes the large-scale self-supervised speech representation learning architecture developed by Facebook.

Relatively lightweight

With 300M parameters, it maintains performance while reducing computational resource requirements.

Model Capabilities

Spanish speech recognition

Speech-to-text

Audio content transcription

Use Cases

Speech transcription

Meeting minutes

Automatically converts Spanish meeting recordings into text transcripts.

Achieves a 21.17% word error rate on the evaluation set.

Voice assistant

Used as a speech recognition component for Spanish voice assistant applications.

Accessibility applications

Real-time caption generation

Generates real-time captions for Spanish video content.

🚀 wav2vec2-large-xls-r-300m-spanish-custom

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It can achieve high - quality speech recognition results, with a loss of 0.4426 and a word error rate (Wer) of 0.2117 on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It achieves the following results on the evaluation set:

Loss: 0.4426
Wer: 0.2117

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.2307	0.4	400	1.4431	0.9299
0.7066	0.79	800	0.5928	0.4836
0.4397	1.19	1200	0.4341	0.3730
0.3889	1.58	1600	0.4063	0.3499
0.3607	1.98	2000	0.3834	0.3235
0.2866	2.37	2400	0.3885	0.3163
0.2833	2.77	2800	0.3765	0.3140
0.2692	3.17	3200	0.3849	0.3132
0.2435	3.56	3600	0.3779	0.2984
0.2404	3.96	4000	0.3756	0.2934
0.2153	4.35	4400	0.3770	0.3075
0.2087	4.75	4800	0.3819	0.3022
0.1999	5.14	5200	0.3756	0.2959
0.1838	5.54	5600	0.3827	0.2858
0.1892	5.93	6000	0.3714	0.2999
0.1655	6.33	6400	0.3814	0.2812
0.1649	6.73	6800	0.3685	0.2727
0.1668	7.12	7200	0.3832	0.2825
0.1487	7.52	7600	0.3848	0.2788
0.152	7.91	8000	0.3810	0.2787
0.143	8.31	8400	0.3885	0.2856
0.1353	8.7	8800	0.4103	0.2827
0.1386	9.1	9200	0.4142	0.2874
0.1222	9.5	9600	0.3983	0.2830
0.1288	9.89	10000	0.4179	0.2781
0.1199	10.29	10400	0.4035	0.2789
0.1196	10.68	10800	0.4043	0.2746
0.1169	11.08	11200	0.4105	0.2753
0.1076	11.47	11600	0.4298	0.2686
0.1124	11.87	12000	0.4025	0.2704
0.1043	12.26	12400	0.4209	0.2659
0.0976	12.66	12800	0.4070	0.2672
0.1012	13.06	13200	0.4161	0.2720
0.0872	13.45	13600	0.4245	0.2697
0.0933	13.85	14000	0.4295	0.2684
0.0881	14.24	14400	0.4011	0.2650
0.0848	14.64	14800	0.3991	0.2675
0.0852	15.03	15200	0.4166	0.2617
0.0825	15.43	15600	0.4188	0.2639
0.081	15.83	16000	0.4181	0.2547
0.0753	16.22	16400	0.4103	0.2560
0.0747	16.62	16800	0.4017	0.2498
0.0761	17.01	17200	0.4159	0.2563
0.0711	17.41	17600	0.4112	0.2603
0.0698	17.8	18000	0.4335	0.2529
0.073	18.2	18400	0.4120	0.2512
0.0665	18.6	18800	0.4335	0.2496
0.0657	18.99	19200	0.4143	0.2468
0.0617	19.39	19600	0.4339	0.2435
0.06	19.78	20000	0.4179	0.2438
0.0613	20.18	20400	0.4251	0.2393
0.0583	20.57	20800	0.4347	0.2422
0.0562	20.97	21200	0.4246	0.2377
0.053	21.36	21600	0.4198	0.2338
0.0525	21.76	22000	0.4511	0.2427
0.0499	22.16	22400	0.4482	0.2353
0.0475	22.55	22800	0.4449	0.2329
0.0465	22.95	23200	0.4364	0.2320
0.0443	23.34	23600	0.4481	0.2304
0.0458	23.74	24000	0.4442	0.2267
0.0453	24.13	24400	0.4402	0.2261
0.0426	24.53	24800	0.4262	0.2232
0.0431	24.93	25200	0.4251	0.2210
0.0389	25.32	25600	0.4455	0.2232
0.039	25.72	26000	0.4372	0.2236
0.0378	26.11	26400	0.4236	0.2212
0.0348	26.51	26800	0.4359	0.2204
0.0361	26.9	27200	0.4248	0.2192
0.0356	27.3	27600	0.4397	0.2184
0.0325	27.7	28000	0.4367	0.2181
0.0313	28.09	28400	0.4477	0.2136
0.0306	28.49	28800	0.4533	0.2135
0.0314	28.88	29200	0.4410	0.2136
0.0307	29.28	29600	0.4457	0.2113
0.0309	29.67	30000	0.4426	0.2117

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

📄 License

This project uses the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご