wav2vec2-xlsr-romansh_sursilvan Open-Source Speech Recognition Model - Accurately Recognize the Romansh Sursilvan Dialect

Wav2vec2 Xlsr Romansh Sursilvan

Developed by sammy786

This model is an automatic speech recognition model fine-tuned on the Romansh-Sursilvan dialect dataset based on facebook/wav2vec2-xls-r-1b, achieving a word error rate (WER) of 13.82% on the Common Voice 8 test set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Romansh Speech Recognition #Low Word Error Rate (WER 13.82)#XLS-R1B Fine-tuning

Downloads 18

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition model for the Romansh-Sursilvan dialect, fine-tuned based on Facebook's wav2vec2-xls-r-1b architecture.

Model Features

Low Word Error Rate

Achieves a word error rate (WER) of 13.82% and a character error rate (CER) of 3.02% on the Romansh-Sursilvan dialect test set.

Fine-tuned on Large Model

Fine-tuned based on the facebook/wav2vec2-xls-r-1b large model, inheriting its powerful speech feature extraction capabilities.

Multi-dataset Training

Trained by combining multiple datasets including Common Voice Finnish train.tsv, dev.tsv, and other.tsv.

Model Capabilities

Romansh-Sursilvan dialect speech recognition

Robust speech event detection

Conversational speech processing

Use Cases

Speech Transcription

Romansh-Sursilvan Dialect Speech-to-Text

Converts Romansh-Sursilvan dialect speech content into text

Word error rate 13.82%, character error rate 3.02%

Voice Assistants

Romansh-Sursilvan Dialect Voice Assistant

Supports voice interaction systems in the Romansh-Sursilvan dialect

🚀 sammy786/wav2vec2-xlsr-romansh_sursilvan

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the rm - sursilv subset of the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 dataset, achieving specific performance metrics on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - rm - sursilv dataset. It achieves the following results on the evaluation set (which is 10 percent of the train dataset merged with other and dev datasets):

Loss: 16.38
Wer: 21.25

✨ Features

Model description

"facebook/wav2vec2-xls-r-1b" was finetuned.

Intended uses & limitations

More information needed

📦 Installation

No installation steps provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples provided in the original document, so this section is skipped.

📚 Documentation

Training and evaluation data

Training data - Common voice Finnish train.tsv, dev.tsv and other.tsv

Training procedure

For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000045637994662983496
train_batch_size: 16
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 500
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
200	4.825500	2.932350	1.000000
400	1.325600	0.292645	0.415436
600	0.709800	0.219167	0.324451
800	0.576800	0.174390	0.275477
1000	0.538100	0.183737	0.272116
1200	0.475200	0.159078	0.253871
1400	0.420400	0.167277	0.240907
1600	0.393500	0.167216	0.247269
1800	0.407500	0.178282	0.239827
2000	0.374400	0.184590	0.239467
2200	0.382600	0.164106	0.227824
2400	0.363100	0.162543	0.228544
2600	0.199000	0.172903	0.231665
2800	0.150800	0.160117	0.222662
3000	0.101100	0.169553	0.222662
3200	0.104200	0.161056	0.220622
3400	0.096900	0.161562	0.216781
3600	0.092200	0.163880	0.212580
3800	0.089200	0.162288	0.214140
4000	0.076200	0.160470	0.213540
4200	0.087900	0.162827	0.213060
4400	0.066200	0.161096	0.213300
4600	0.076000	0.162060	0.213660
4800	0.071400	0.162045	0.213300

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.17.1.dev0
Tokenizers 0.10.3

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id sammy786/wav2vec2-xlsr-romansh_sursilvan --dataset mozilla - foundation/common_voice_8_0 --config rm - sursilv --split test

🔧 Technical Details

The model is based on fine - tuning "facebook/wav2vec2-xls-r-1b" on the specific dataset. The training process involved appending all possible datasets and using a 90 - 10 split to create the training dataset. Specific hyperparameters were used during training, and the model's performance was evaluated on an evaluation set, achieving certain loss and WER values.

📄 License

License: apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご