Wav2vec2-xls-r-300m-English Open-source Speech Recognition Model - Accurately Convert English Speech to Text

Wav2vec2 Xls R 300m English

Developed by vitouphy

XLS-R-300M is an English automatic speech recognition model fine-tuned on the librispeech_asr dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate of 12.29% on the LibriSpeech test set.

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #English Speech Recognition #Low Word Error Rate #Multi-scenario Adaptation

Downloads 21

Release Time : 3/2/2022

Model Overview

This model is an English automatic speech recognition (ASR) model, specifically optimized for English speech-to-text conversion tasks.

Model Features

Excellent Performance on Multiple Datasets

Evaluated on multiple datasets including LibriSpeech, Common Voice, and Robust Speech Events, demonstrating stable performance.

Efficient Training

Utilizes techniques such as gradient accumulation and mixed-precision training to improve training efficiency.

Low Word Error Rate

Achieves a word error rate of 12.29% on the LibriSpeech clean test set, demonstrating excellent performance.

Model Capabilities

English Speech Recognition

Speech-to-Text

Long Audio Processing

Use Cases

Speech Transcription

Audiobook Transcription

Transcribe audiobook content into text

Word error rate of 12.29% on the LibriSpeech test set

Voice Assistants

Voice Command Recognition

Recognize and understand user voice commands

Word error rate of 38.8% on the Robust Speech Events test set

🚀 XLS-R-300M - English

This is a fine - tuned model for Automatic Speech Recognition, achieving good results on multiple datasets.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the librispeech_asr dataset. It achieves the following results on the evaluation set:

Loss: 0.1444
Wer: 0.1167

✨ Features

Datasets

librispeech_asr

Model Index

Task	Dataset	Metrics
Automatic Speech Recognition	LibriSpeech (clean) - test Robust Speech Event - Dev Data Common Voice 8.0 - test Robust Speech Event - Test Data	Test WER: 12.29 Test CER: 3.34 Validation WER: 36.75 Validation CER: 14.83 Test WER: 37.81 Test WER: 38.8

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.9365	4.17	500	2.9398	0.9999
1.5444	8.33	1000	0.5947	0.4289
1.1367	12.5	1500	0.2751	0.2366
0.9972	16.66	2000	0.2032	0.1797
0.9118	20.83	2500	0.1786	0.1479
0.8664	24.99	3000	0.1641	0.1408
0.8251	29.17	3500	0.1537	0.1267
0.793	33.33	4000	0.1525	0.1244
0.785	37.5	4500	0.1470	0.1184
0.7612	41.66	5000	0.1446	0.1177
0.7478	45.83	5500	0.1449	0.1176
0.7443	49.99	6000	0.1444	0.1167

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 Xls R 300m English

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 XLS-R-300M - English

🚀 Quick Start

✨ Features

Tags

Datasets

Model Index

📚 Documentation

Model description

Intended uses & limitations

Training and evaluation data

🔧 Technical Details

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License