wav2vec2-large-xls-r-300m-ia Open-source Speech Recognition Model - Free Deployment for Accurate Recognition of Multilingual Speech

Wav2vec2 Large Xls R 300m Ia

Developed by ayameRushia

An automatic speech recognition model fine-tuned on the Common Voice 8.0 international language dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #International Speech Recognition #Low Word Error Rate #Multilingual Support

Downloads 23

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for international languages, fine-tuned on the Common Voice 8.0 dataset, supporting speech-to-text conversion.

Model Features

High-Performance Speech Recognition

Achieved a word error rate (WER) of 8.6074% and a character error rate (CER) of 2.4147% on the Common Voice 8.0 international language test set

Language Model Support

Supports decoding with a language model, significantly improving recognition accuracy

Based on Large-Scale Pretrained Model

Fine-tuned on the facebook/wav2vec2-xls-r-300m model, inheriting its powerful speech feature extraction capabilities

Model Capabilities

Speech-to-Text

International Speech Recognition

Supports Language Model Decoding

Use Cases

Speech Transcription

International Language Speech Transcription

Convert international language speech content into text

Achieved a word error rate of 8.6074% on the test set

Voice Assistants

International Language Voice Command Recognition

Recognize international language voice commands

🚀 wav2vec2-large-xls-r-300m-ia

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It is designed for automatic speech recognition tasks and has achieved remarkable results on the evaluation set.

🚀 Quick Start

Evaluation

You can evaluate the model using the following commands:

huggingface-cli login #login to huggingface for getting auth token to access the common voice v8
#running with LM
python eval.py --model_id ayameRushia/wav2vec2-large-xls-r-300m-ia --dataset mozilla-foundation/common_voice_8_0 --config ia --split test

# running without LM
python eval.py --model_id ayameRushia/wav2vec2-large-xls-r-300m-ia --dataset mozilla-foundation/common_voice_8_0 --config ia --split test --greedy

✨ Features

Fine - tuned Model: Based on facebook/wav2vec2-xls-r-300m, fine - tuned on the common_voice dataset.
Good Performance: Achieves low WER (Word Error Rate) and CER (Character Error Rate) on the evaluation set.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-ia
Training Data	mozilla-foundation/common_voice_8_0

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.1452
Wer: 0.1253

Training Procedure

Training is conducted in Google Colab, and the training notebook is provided in the repo.

Training and Evaluation Data

Language Model: Created from texts from processed sentences in the train + validation split of the dataset (common voice 8.0 for Interlingua).
Evaluation: Conducted in the notebook "notebook_evaluation_wav2vec2_ia.ipynb" within the repo.

Evaluation Metrics

Test WER without LM:
- wer = 20.1776 %
- cer = 4.7205 %
Test WER using LM:
- wer = 8.6074 %
- cer = 2.4147 %

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
train_batch_size: 16
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
num_epochs: 30
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
7.432	1.87	400	2.9636	1.0
2.6922	3.74	800	2.2111	0.9977
1.2581	5.61	1200	0.4864	0.4028
0.6232	7.48	1600	0.2807	0.2413
0.4479	9.35	2000	0.2219	0.1885
0.3654	11.21	2400	0.1886	0.1606
0.323	13.08	2800	0.1716	0.1444
0.2935	14.95	3200	0.1687	0.1443
0.2707	16.82	3600	0.1632	0.1382
0.2559	18.69	4000	0.1507	0.1337
0.2433	20.56	4400	0.1572	0.1358
0.2338	22.43	4800	0.1489	0.1305
0.2258	24.3	5200	0.1485	0.1278
0.2218	26.17	5600	0.1470	0.1272
0.2169	28.04	6000	0.1470	0.1270
0.2117	29.91	6400	0.1452	0.1253

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご