wav2vec2-large-xls-r-300m-sl-with-LM-v2 Open-source Model - Accurately Recognize Slovenian Speech

Wav2vec2 Large Xls R 300m Sl With LM V2

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on the Slovenian language (common_voice_8_0) dataset based on facebook/wav2vec2-xls-r-300m, supporting language model (LM) enhancement.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Slovenian speech recognition #High-precision WER optimization #Multi-scenario speech transcription

Downloads 26

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Slovenian speech recognition tasks, demonstrating excellent performance on the Common Voice 8 dataset, with language model enhancement to improve recognition accuracy.

Model Features

Language model enhancement

Supports language model (LM) enhancement, significantly improving recognition accuracy (WER reduced from 0.217 to 0.146)

Multi-dataset validation

Comprehensively evaluated on Common Voice 8 and Robust Speech Event datasets

Efficient training

Optimized training process with mixed-precision training and linear learning rate scheduler

Model Capabilities

Slovenian speech recognition

Long audio processing (supports chunk processing)

Language model integration

Use Cases

Speech transcription

Speech-to-text

Convert Slovenian speech to text

Achieved WER 0.217 (without LM)/0.146 (with LM) on Common Voice 8 test set

Voice assistants

Slovenian voice command recognition

Used for voice assistant or voice control system command recognition

WER 46.69 on Robust Speech Event test set

🚀 wav2vec2-large-xls-r-300m-sl-with-LM-v2

This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for automatic speech recognition in the Slovenian language. It is trained on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset, achieving excellent performance on multiple evaluation metrics.

✨ Features

Automatic Speech Recognition: Specialized for Slovenian speech recognition.
Trained on High - Quality Dataset: Utilizes the Mozilla Foundation's Common Voice 8.0 dataset.
Robust Performance: Demonstrates good results on both standard and robust speech event datasets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The following are the evaluation commands for different datasets:

Evaluate on mozilla - foundation/common_voice_8_0

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v2 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sl-with-LM-v2 --dataset speech-recognition-community-v2/dev_data --config sl --split validation --chunk_length_s 10 --stride_length_s 1

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-sl-with-LM-v2
Training Datasets	mozilla-foundation/common_voice_8_0
Languages	Slovenian (sl)

Evaluation Results

The model has the following evaluation results:

Common Voice 8:
- Test WER: 0.21695212999560826
- Test CER: 0.052850080572474256
- Test WER (+LM): 0.14551310203484116
- Test CER (+LM): 0.03927566711277415
Robust Speech Event - Dev Data:
- Dev WER: 0.560722380639029
- Dev CER: 0.2279626093074681
- Dev WER (+LM): 0.46486802661402354
- Dev CER (+LM): 0.21105136194592422
Robust Speech Event - Test Data:
- Test WER: 46.69

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
6.9294	6.1	500	2.9712	1.0
2.8305	12.2	1000	1.7073	0.9479
1.4795	18.29	1500	0.5756	0.6397
1.3433	24.39	2000	0.4968	0.5424
1.1766	30.49	2500	0.4185	0.4743
1.0017	36.59	3000	0.3303	0.3578
0.9358	42.68	3500	0.3003	0.3051
0.8358	48.78	4000	0.3045	0.2884
0.7647	54.88	4500	0.2866	0.2677
0.7482	60.98	5000	0.2829	0.2585
0.6943	67.07	5500	0.2782	0.2478
0.6586	73.17	6000	0.2911	0.2537
0.6425	79.27	6500	0.2817	0.2462
0.6067	85.37	7000	0.2910	0.2436
0.5974	91.46	7500	0.2875	0.2430
0.5812	97.56	8000	0.2852	0.2396

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご