Wav2vec2-xls-r-300m-es open-source model - Free and accurate automatic Spanish speech recognition

Wav2vec2 Xls R 300m Es

Developed by samitizerxu

This model is a fine-tuned Spanish automatic speech recognition model based on facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - ES dataset.

Speech Recognition

Transformers

SpanishOpen Source License:Apache-2.0 #Spanish speech recognition #General speech dataset #Low character error rate

Downloads 23

Release Time : 3/2/2022

Model Overview

A fine-tuned model for Spanish automatic speech recognition, based on the wav2vec2-xls-r-300m architecture, trained on a general speech dataset.

Model Features

Multi-dataset evaluation

Comprehensively evaluated on Common Voice 7 and Robust Speech Event datasets

Medium-sized model

Based on the 300M-parameter wav2vec2-xls-r architecture, balancing performance and efficiency

Spanish optimization

Specifically fine-tuned for Spanish speech recognition tasks

Model Capabilities

Spanish speech recognition

Continuous speech-to-text

Multi-scenario speech processing

Use Cases

Speech transcription

Spanish speech-to-text

Convert Spanish speech content into text

Achieved 37.37% WER on the Common Voice 7 test set

Voice assistant

Spanish voice command recognition

Recognize and understand Spanish voice commands

Achieved 57.28% WER on the Robust Speech Event test set

🚀 wav2vec2-cls-r-300m-es

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - ES dataset. It offers solutions for automatic speech recognition tasks, achieving certain performance metrics on evaluation sets.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - ES dataset. It achieves the following results on the evaluation set:

Loss: 0.5160
Wer: 0.4016

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
num_epochs: 8.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.1277	1.14	500	2.0259	0.9999
1.4111	2.28	1000	1.1251	0.8894
0.8461	3.42	1500	0.8205	0.7244
0.5042	4.57	2000	0.6116	0.5463
0.3072	5.71	2500	0.5507	0.4506
0.2181	6.85	3000	0.5213	0.4177
0.1608	7.99	3500	0.5161	0.4019

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

Evaluation Commands

Basic Usage

To evaluate on mozilla-foundation/common_voice_7_0 with split test

python eval.py --model_id samitizerxu/wav2vec2-xls-r-300m-es --dataset mozilla-foundation/common_voice_7_0 --config es --split test

Advanced Usage

To evaluate on speech-recognition-community-v2/dev_data

python eval.py --model_id samitizerxu/wav2vec2-xls-r-300m-es --dataset speech-recognition-community-v2/dev_data --config es --split validation --chunk_length_s 5.0 --stride_length_s 1.0

📄 License

This project is under the Apache - 2.0 license.

📊 Model Index

Property	Details
Model Name	wav2vec2-cls-r-300m-es
Task	Automatic Speech Recognition
Dataset 1	Common Voice 7 (mozilla-foundation/common_voice_7_0, args: es), Test WER: 37.37, Test CER: 7.11
Dataset 2	Robust Speech Event - Dev Data (speech-recognition-community-v2/dev_data, args: es), Test WER: 55.69
Dataset 3	Robust Speech Event - Test Data (speech-recognition-community-v2/eval_data, args: es), Test WER: 57.28

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご