xls-r-spanish-test Open-source Automatic Speech Recognition Model - Free Deployment and Accurate Recognition of Spanish Speech

Xls R Spanish Test

Developed by pablouribe

This is an automatic speech recognition (ASR) model fine-tuned on the Spanish Common Voice 7 dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition

Transformers

SpanishOpen Source License:Apache-2.0 #Spanish speech recognition #Low CER transcription #Common Voice optimization

Downloads 29

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Spanish speech recognition tasks and performs exceptionally well on the Common Voice 7 dataset, with a test WER of 13.89.

Model Features

Spanish Speech Recognition

Speech recognition capability optimized specifically for Spanish

Based on XLS-R Architecture

Uses facebook/wav2vec2-large-xlsr-53 as the base model

Multi-dataset Evaluation

Comprehensively evaluated on Common Voice and Robust Speech Event datasets

Model Capabilities

Spanish speech-to-text

Automatic speech recognition

Speech content transcription

Use Cases

Speech Transcription

Spanish Speech Content Transcription

Convert Spanish speech content into text

WER of 13.89 on the Common Voice 7 test set

Voice Assistants

Spanish Voice Command Recognition

Recognize and understand Spanish voice commands

🚀 xls-r-spanish-test

This is a fine - tuned model for automatic speech recognition, achieving good results on the Spanish dataset.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-large-xlsr-53 on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - ES dataset. It achieves the following results on the evaluation set:

Loss: 0.1461
Wer: 1.0063

✨ Features

Automatic Speech Recognition: Specialized for Spanish speech recognition tasks.
Fine - tuned: Based on the pre - trained model facebook/wav2vec2-large-xlsr-53, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - ES dataset.

📚 Documentation

Model Index

Task	Dataset	Metrics	Value
Speech Recognition	Common Voice 7 (`mozilla - foundation/common_voice_7_0`, args: es)	Test WER	13.89
Speech Recognition	Common Voice 7 (`mozilla - foundation/common_voice_7_0`, args: es)	Test CER	3.85
Speech Recognition	Robust Speech Event - Dev Data (`speech - recognition - community - v2/dev_data`, args: es)	Test WER	37.66
Speech Recognition	Robust Speech Event - Dev Data (`speech - recognition - community - v2/dev_data`, args: es)	Test CER	15.32
Automatic Speech Recognition	Robust Speech Event - Test Data (`speech - recognition - community - v2/eval_data`, args: es)	Test WER	41.17

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 5.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
2.953	0.15	1000	2.9528	1.0
1.1519	0.3	2000	0.3735	1.0357
1.0278	0.45	3000	0.2529	1.0390
0.9922	0.61	4000	0.2208	1.0270
0.9618	0.76	5000	0.2088	1.0294
0.9364	0.91	6000	0.2019	1.0214
0.9179	1.06	7000	0.1940	1.0294
0.9154	1.21	8000	0.1915	1.0290
0.8985	1.36	9000	0.1837	1.0211
0.9055	1.51	10000	0.1838	1.0273
0.8861	1.67	11000	0.1765	1.0139
0.892	1.82	12000	0.1723	1.0188
0.8778	1.97	13000	0.1735	1.0092
0.8645	2.12	14000	0.1707	1.0106
0.8595	2.27	15000	0.1713	1.0186
0.8392	2.42	16000	0.1686	1.0053
0.8436	2.57	17000	0.1653	1.0096
0.8405	2.73	18000	0.1689	1.0077
0.8382	2.88	19000	0.1645	1.0114
0.8247	3.03	20000	0.1647	1.0078
0.8219	3.18	21000	0.1611	1.0026
0.8024	3.33	22000	0.1580	1.0062
0.8087	3.48	23000	0.1578	1.0038
0.8097	3.63	24000	0.1556	1.0057
0.8094	3.79	25000	0.1552	1.0035
0.7836	3.94	26000	0.1516	1.0052
0.8042	4.09	27000	0.1515	1.0054
0.7925	4.24	28000	0.1499	1.0031
0.7855	4.39	29000	0.1490	1.0041
0.7814	4.54	30000	0.1482	1.0068
0.7859	4.69	31000	0.1460	1.0066
0.7819	4.85	32000	0.1464	1.0062
0.7784	5.0	33000	0.1460	1.0063

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3.dev0
Tokenizers 0.11.0

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご