The open-source speech recognition model wav2vec2-large-xls-r-300m-hsb-v1 - Accurately recognize Upper Sorbian speech

Wav2vec2 Large Xls R 300m Hsb V1

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on the Upper Sorbian (HSB) dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate (WER) of 0.4393 on the Common Voice 8 test set.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Upper Sorbian speech recognition #Low-resource language ASR #WER optimization

Downloads 20

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic speech recognition tasks in Upper Sorbian, based on the wav2vec2 architecture and trained on the Mozilla Common Voice 8 dataset.

Model Features

Low-resource language support

A speech recognition model specifically optimized for low-resource languages like Upper Sorbian

Based on XLS-R architecture

Uses Facebook's wav2vec2-XLS-R-300M as the base model, featuring strong cross-lingual representation capabilities

Fine-tuned on Common Voice

Fine-tuned on the Upper Sorbian dataset from Mozilla Common Voice 8 to adapt to specific language features

Model Capabilities

Upper Sorbian speech recognition

Speech-to-text

Use Cases

Speech transcription

Upper Sorbian speech transcription

Convert Upper Sorbian speech content into text

Achieved a WER of 0.4393 on the Common Voice test set

Language preservation

Digitization of minority languages

Helps preserve and digitize minority languages like Upper Sorbian

🚀 wav2vec2-large-xls-r-300m-hsb-v1

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. It is designed for automatic speech recognition in the Upper Sorbian (HSB) language, offering high - quality performance for related tasks.

✨ Features

Language Support: Specialized for the Upper Sorbian (HSB) language in automatic speech recognition.
Fine - Tuned Model: Based on the pre - trained facebook/wav2vec2-xls-r-300m model, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset.
Comprehensive Evaluation: Evaluated on multiple datasets with various metrics such as WER and CER.

📦 Installation

The document doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

The document doesn't provide basic usage code examples, so this part is skipped.

Advanced Usage

The document doesn't provide advanced usage code examples, so this part is skipped.

📚 Documentation

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hsb-v1 --dataset mozilla-foundation/common_voice_8_0 --config hsb --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Upper Sorbian language isn't available in speech - recognition - community - v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00045
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
8.972	3.23	100	3.7498	1.0
3.3401	6.45	200	3.2320	1.0
3.2046	9.68	300	3.1741	0.9806
2.4031	12.9	400	1.0579	0.8996
1.0427	16.13	500	0.7989	0.7557
0.741	19.35	600	0.6405	0.6299
0.5699	22.58	700	0.6129	0.5928
0.4607	25.81	800	0.6548	0.5695
0.3827	29.03	900	0.6268	0.5190
0.3282	32.26	1000	0.5919	0.5016
0.2764	35.48	1100	0.5953	0.4805
0.2335	38.71	1200	0.5717	0.4728
0.2106	41.94	1300	0.5674	0.4569
0.1592	48.39	1500	0.5684	0.4402

Framework versions

Transformers 4.16.1
Pytorch 1.10.0+cu111
Datasets 1.18.2
Tokenizers 0.11.0

🔧 Technical Details

The model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. It achieves certain results on the evaluation set, with a loss of 0.5684 and a Wer of 0.4402. During training, specific hyperparameters were used, and the training results are presented in a table, showing the changes in training loss, validation loss, and Wer over epochs and steps.

📄 License

The model is licensed under the Apache - 2.0 license.

Additional Information

Property	Details
Model Type	Fine - tuned `wav2vec2-large-xls-r-300m-hsb-v1` for automatic speech recognition
Training Data	`mozilla - foundation/common_voice_8_0`
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, hsb, robust - speech - event, model_for_talk, hf - asr - leaderboard

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご