wav2vec2-xls-r-sl-a2 Open-source Speech Recognition Model - Accurately Recognize Slovenian Speech

Wav2vec2 Xls R Sl A2

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on the Slovenian language dataset (MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SL) based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Slovenian speech recognition #Multi-dialect robustness #Low CER performance

Downloads 24

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Slovenian speech recognition tasks, achieving a WER of 0.2169 and a CER of 0.0528 on the Common Voice 8 test set.

Model Features

High-precision Slovenian recognition

Achieves a WER of 0.2169 and CER of 0.0528 on the Common Voice 8 test set, demonstrating excellent performance.

Based on large-scale pre-trained models

Fine-tuned from the facebook/wav2vec2-xls-r-300m model, featuring powerful speech feature extraction capabilities.

Multi-scenario adaptability

Shows certain performance on robust speech event datasets, indicating its adaptability in various speech scenarios.

Model Capabilities

Slovenian speech-to-text

Automatic speech recognition

Speech content transcription

Use Cases

Speech transcription

Slovenian speech transcription

Convert Slovenian speech content into text

WER 0.2169 (Common Voice 8 test set)

Voice assistants

Slovenian voice command recognition

Used for command recognition in Slovenian voice assistant systems

🚀 wav2vec2-xls-r-sl-a2

This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for automatic speech recognition in the Slovenian language. It provides high - quality speech recognition performance on multiple datasets.

🚀 Quick Start

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split:

python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-sl-a2 --dataset mozilla-foundation/common_voice_8_0 --config sl --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data:

Votic language not found in speech-recognition-community-v2/dev_data

✨ Features

High - quality Speech Recognition: Achieves low WER and CER on multiple datasets, demonstrating excellent performance in automatic speech recognition.
Fine - tuned for Slovenian: Specifically optimized for the Slovenian language, suitable for related speech recognition tasks.

📦 Installation

The installation process is not provided in the original document, so this section is skipped.

💻 Usage Examples

The original document does not provide code examples, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2 - xls - r - sl - a2
Training Data	mozilla - foundation/common_voice_8_0

Evaluation Results

This model has achieved the following results on different datasets:

Common Voice 8:
- Test WER: 0.21695212999560826
- Test CER: 0.052850080572474256
Robust Speech Event - Dev Data (vot):
- Test WER: 0.560722380639029
- Test CER: 0.2279626093074681
Robust Speech Event - Dev Data (sl):
- Test WER: 56.07
Robust Speech Event - Test Data (sl):
- Test WER: 56.19

Training Hyperparameters

Learning Rate: 7e - 05
Train Batch Size: 32
Eval Batch Size: 32
Seed: 42
Optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
LR Scheduler Type: linear
LR Scheduler Warmup Steps: 1000
Number of Epochs: 100.0
Mixed Precision Training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
6.9294	6.1	500	2.9712	1.0
2.8305	12.2	1000	1.7073	0.9479
1.4795	18.29	1500	0.5756	0.6397
1.3433	24.39	2000	0.4968	0.5424
1.1766	30.49	2500	0.4185	0.4743
1.0017	36.59	3000	0.3303	0.3578
0.9358	42.68	3500	0.3003	0.3051
0.8358	48.78	4000	0.3045	0.2884
0.7647	54.88	4500	0.2866	0.2677
0.7482	60.98	5000	0.2829	0.2585
0.6943	67.07	5500	0.2782	0.2478
0.6586	73.17	6000	0.2911	0.2537
0.6425	79.27	6500	0.2817	0.2462
0.6067	85.37	7000	0.2910	0.2436
0.5974	91.46	7500	0.2875	0.2430
0.5812	97.56	8000	0.2852	0.2396

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

🔧 Technical Details

Training Process

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SL dataset. During training, a series of hyperparameters were used, including a learning rate of 7e - 05, a batch size of 32, and a linear learning rate scheduler with 1000 warm - up steps. The training lasted for 100 epochs, using Native AMP for mixed - precision training.

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご