Open-source Speech Recognition Model wav2vec2-large-xls-r-300m-hsb-v3 - Precise Recognition of Upper Sorbian Speech

Wav2vec2 Large Xls R 300m Hsb V3

Developed by DrishtiSharma

An automatic speech recognition model fine-tuned on the Upper Sorbian (hsb) dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Upper Sorbian speech recognition #Low-resource language ASR #Multi-dialect adaptability

Downloads 18

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition system for Upper Sorbian, fine-tuned on the Common Voice 8 dataset, capable of converting Upper Sorbian speech into text.

Model Features

Upper Sorbian optimization

Specially fine-tuned for Upper Sorbian, demonstrating excellent performance in this language

Based on large-scale pre-trained model

Fine-tuned on facebook's wav2vec2-xls-r-300m model, inheriting its powerful speech feature extraction capabilities

Multi-scenario applicability

Performs well on the Common Voice dataset, suitable for various speech recognition scenarios

Model Capabilities

Upper Sorbian speech recognition

Speech-to-text conversion

Conversational speech processing

Use Cases

Speech transcription

Upper Sorbian speech transcription

Convert Upper Sorbian speech content into text

Achieved WER of 0.476 and CER of 0.112 on the test set

Voice assistant

Upper Sorbian voice assistant

Provides voice interaction capabilities for Upper Sorbian users

🚀 wav2vec2-large-xls-r-300m-hsb-v3

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset. It is designed for automatic speech recognition in the Upper Sorbian (hsb) language, and it has achieved certain results on the evaluation set, which makes it a valuable model in the field of speech recognition.

✨ Features

Language Support: Specifically tailored for the Upper Sorbian (hsb) language, suitable for related speech recognition tasks.
Fine - Tuned Model: Based on the pre - trained model facebook/wav2vec2-xls-r-300m, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HSB dataset to better adapt to the target language.
Performance Metrics: Achieved specific results on the evaluation set, including Loss and Wer, which can be used as a reference for model performance.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-hsb-v3
Training Data	mozilla - foundation/common_voice_8_0
License	apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, hsb, robust - speech - event, model_for_talk, hf - asr - leaderboard

Model Results

The model has the following performance on different datasets:

Common Voice 8:
- Task: Automatic Speech Recognition
- Metrics:
  - Test WER: 0.4763681592039801
  - Test CER: 0.11194945177476305
Robust Speech Event - Dev Data:
- Task: Automatic Speech Recognition
- Metrics:
  - Test WER: NA
  - Test CER: NA

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hsb-v3 --dataset mozilla-foundation/common_voice_8_0 --config hsb --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

⚠️ Important Note

Upper Sorbian (hsb) language not found in speech - recognition - community - v2/dev_data!

Training Hyperparameters

The following hyperparameters were used during the training process:

learning_rate: 0.00045
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
8.8951	3.23	100	3.6396	1.0
3.314	6.45	200	3.2331	1.0
3.1931	9.68	300	3.0947	0.9906
1.7079	12.9	400	0.8865	0.8499
0.6859	16.13	500	0.7994	0.7529
0.4804	19.35	600	0.7783	0.7069
0.3506	22.58	700	0.6904	0.6321
0.2695	25.81	800	0.6519	0.5926
0.222	29.03	900	0.7041	0.5720
0.1828	32.26	1000	0.6608	0.5513
0.1474	35.48	1100	0.7129	0.5319
0.1269	38.71	1200	0.6664	0.5056
0.1077	41.94	1300	0.6712	0.4942
0.0934	45.16	1400	0.6467	0.4879
0.0819	48.39	1500	0.6549	0.4827

Framework Versions

Transformers 4.16.1
Pytorch 1.10.0+cu111
Datasets 1.18.2
Tokenizers 0.11.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご