Open-source Speech Recognition Model wav2vec2-large-xls-r-300m-sl-with-LM-v1

Wav2vec2 Large Xls R 300m Sl With LM V1

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on the Slovenian language (Common Voice 8.0) dataset based on the facebook/wav2vec2-xls-r-300m model, with improved recognition performance through language model (LM) integration.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Slovenian speech recognition #High-precision WER #Multi-scenario adaptation

Downloads 25

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Slovenian speech recognition tasks and has achieved good recognition accuracy on the Common Voice 8.0 dataset.

Model Features

Language Model Enhancement

Integration with a language model (LM) significantly improves recognition accuracy, reducing WER from 20.6% to 13.5%

Multi-dataset Validation

Validated on multiple datasets including Common Voice and Robust Speech Events

Efficient Training

Optimized training process using mixed-precision training and linear learning rate schedulers

Model Capabilities

Slovenian speech recognition

Long audio processing (supports chunking)

High-accuracy character recognition (CER 3.8%)

Use Cases

Speech-to-Text

Speech Transcription

Convert Slovenian speech to text

WER 13.5% on the Common Voice test set

Voice Assistants

Voice Command Recognition

Recognize Slovenian voice commands

WER 46.17% on the Robust Speech Events test set

🚀 wav2vec2-large-xls-r-300m-sl-with-LM-v1

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset. It is designed for automatic speech recognition, aiming to provide accurate speech - to - text conversion.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Language	Slovenian (sl)
License	Apache - 2.0
Tags	automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, model_for_talk, mozilla - foundation/common_voice_8_0, robust - speech - event, sl
Datasets	mozilla - foundation/common_voice_8_0

Model Results

The model wav2vec2-large-xls-r-300m-sl-with-LM-v1 has the following evaluation results:

Common Voice 8 (Test Data):
- Test WER: 0.20626555409164105
- Test CER: 0.051648321634392154
- Test WER (+LM): 0.13482652613087395
- Test CER (+LM): 0.038838663862562475
Robust Speech Event - Dev Data:
- Dev WER: 0.5406156320830592
- Dev CER: 0.22249723590310583
- Dev WER (+LM): 0.49783147459727384
- Dev CER (+LM): 0.1591062599627158
Robust Speech Event - Test Data:
- Test WER: 46.17

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v1 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v1 --dataset speech-recognition-community-v2/dev_data --config sl --split validation --chunk_length_s 10 --stride_length_s 1

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.1e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
3.3881	6.1	500	2.9710	1.0
2.6401	12.2	1000	1.7677	0.9734
1.5152	18.29	1500	0.5564	0.6011
1.2191	24.39	2000	0.4319	0.4390
1.0237	30.49	2500	0.3141	0.3175
0.8892	36.59	3000	0.2748	0.2689
0.8296	42.68	3500	0.2680	0.2534
0.7602	48.78	4000	0.2820	0.2506
0.7186	54.88	4500	0.2672	0.2398
0.6887	60.98	5000	0.2729	0.2402
0.6507	67.07	5500	0.2767	0.2361
0.6226	73.17	6000	0.2817	0.2332
0.6024	79.27	6500	0.2679	0.2279
0.5787	85.37	7000	0.2837	0.2316
0.5744	91.46	7500	0.2838	0.2284
0.5556	97.56	8000	0.2763	0.2281

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご