Open-source model wav2vec2-large-xls-r-300m-hsb-v2 - Achieving automatic speech recognition for Upper Sorbian

Wav2vec2 Large Xls R 300m Hsb V2

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on the Upper Sorbian (HSB) dataset based on Facebook's wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Upper Sorbian speech recognition #Low word error rate #Multi-scenario speech processing

Downloads 19

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Upper Sorbian speech recognition tasks, fine-tuned on the Common Voice 8 dataset, capable of converting Upper Sorbian speech into text.

Model Features

Dedicated to Upper Sorbian

A speech recognition model specifically optimized for Upper Sorbian

Based on large-scale pre-trained model

Fine-tuned on Facebook's wav2vec2-xls-r-300m model with powerful speech feature extraction capabilities

Relatively high recognition accuracy

Achieves 46.5% word error rate (WER) and 11.4% character error rate (CER) on the Common Voice 8 test set

Model Capabilities

Upper Sorbian speech recognition

Speech-to-text

Automatic speech transcription

Use Cases

Speech transcription

Upper Sorbian speech transcription

Convert Upper Sorbian speech content into text

46.5% WER on Common Voice 8 test set

Language preservation

Digitization of minority languages

Helps preserve and digitize minority languages like Upper Sorbian

🚀 wav2vec2-large-xls-r-300m-hsb-v2

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. It is designed for automatic speech recognition tasks, aiming to accurately transcribe speech in the Upper Sorbian (hsb) language.

✨ Features

Multilingual Adaptation: Based on the large - scale wav2vec2 - xls - r - 300m model, it can be well - adapted to the HSB language.
High - quality Performance: Achieves relatively low WER and CER on the evaluation set, indicating high recognition accuracy.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.5328
Wer: 0.4596

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hsb-v2 --dataset mozilla-foundation/common_voice_8_0 --config hsb --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Upper Sorbian (hsb) not found in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

Property	Details
Learning Rate	0.00045
Train Batch Size	16
Eval Batch Size	8
Seed	42
Gradient Accumulation Steps	2
Total Train Batch Size	32
Optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
LR Scheduler Type	linear
LR Scheduler Warmup Steps	500
Num Epochs	50
Mixed Precision Training	Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
8.5979	3.23	100	3.5602	1.0
3.303	6.45	200	3.2238	1.0
3.2034	9.68	300	3.2002	0.9888
2.7986	12.9	400	1.2408	0.9210
1.3869	16.13	500	0.7973	0.7462
1.0228	19.35	600	0.6722	0.6788
0.8311	22.58	700	0.6100	0.6150
0.717	25.81	800	0.6236	0.6013
0.6264	29.03	900	0.6031	0.5575
0.5494	32.26	1000	0.5656	0.5309
0.4781	35.48	1100	0.5289	0.4996
0.4311	38.71	1200	0.5375	0.4768
0.3902	41.94	1300	0.5246	0.4703
0.3508	45.16	1400	0.5382	0.4696
0.3199	48.39	1500	0.5328	0.4596

Framework Versions

Transformers 4.16.1
Pytorch 1.10.0+cu111
Datasets 1.18.2
Tokenizers 0.11.0

🔧 Technical Details

The model is fine - tuned from the pre - trained wav2vec2 - xls - r - 300m model on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. Through specific hyperparameter settings and training processes, it can adapt to the characteristics of the HSB language and achieve good speech recognition performance.

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご