wav2vec2-xls-r-300m-rm-vallader-d1 Open-source Model - Free Implementation of Romansh

Wav2vec2 Xls R 300m Rm Vallader D1

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on the Romansh-Vallader dataset based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Romansh speech recognition #Low word error rate #Multilingual support

Downloads 23

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic speech recognition tasks in Romansh-Vallader and has achieved good performance on the Common Voice 8 dataset.

Model Features

Multilingual support

Speech recognition capabilities specifically optimized for Romansh-Vallader

Efficient training

Fine-tuned based on pre-trained models with high training efficiency

Good performance

Achieved a word error rate (WER) of 26.47% and a character error rate (CER) of 5.86% on the Common Voice 8 test set

Model Capabilities

Speech-to-text

Romansh-Vallader recognition

Use Cases

Speech transcription

Speech content transcription

Convert Romansh-Vallader speech content into text

26.47% WER

Voice assistant

Localized voice assistant

Develop voice assistant applications for Romansh-Vallader speaking regions

🚀 wav2vec2-xls-r-300m-rm-vallader-d1

This is a fine - tuned model for Automatic Speech Recognition on the RM - Vallader language, achieving good performance on Common Voice 8 and related datasets.

✨ Features

Based on the pre - trained model facebook/wav2vec2-xls-r-300m, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - RM - VALLADER dataset.
Suitable for Automatic Speech Recognition tasks in the RM - Vallader language.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2-xls-r-300m-rm-vallader-d1
Training Datasets	mozilla-foundation/common_voice_8_0
License	Apache-2.0
Tags	automatic-speech-recognition, mozilla-foundation/common_voice_8_0, generated_from_trainer, rm-vallader, robust-speech-event, model_for_talk, hf-asr-leaderboard

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.2754
Wer: 0.2831

The detailed evaluation results on different datasets are as follows:

Task	Dataset	Test WER	Test CER
Automatic Speech Recognition	Common Voice 8 (rm - vallader)	0.26472007722007723	0.05860608074430969
Automatic Speech Recognition	Robust Speech Event - Dev Data (vot)	NA	NA

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-300m-rm-vallader-d1 --dataset mozilla-foundation/common_voice_8_0 --config rm-vallader --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

Romansh - Vallader language not found in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
2.927	15.15	500	2.9196	1.0
1.3835	30.3	1000	0.5879	0.5866
0.7415	45.45	1500	0.3077	0.3316
0.5575	60.61	2000	0.2735	0.2954
0.4581	75.76	2500	0.2707	0.2802
0.3977	90.91	3000	0.2785	0.2809

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご