Open-source Speech Recognition Model wav2vec2-xls-r-300m-rm-sursilv-d11 - Accurately Recognize the Sursilvan Dialect of Romansh

Wav2vec2 Xls R 300m Rm Sursilv D11

Developed by DrishtiSharma

This model is an automatic speech recognition model fine-tuned on the Romansh-Sursilvan dialect dataset based on facebook/wav2vec2-xls-r-300m, achieving a 24.09% Word Error Rate (WER) on the Common Voice 8 test set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Romansh speech recognition #Low Word Error Rate (WER)#Common Voice adaptation

Downloads 20

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition model for the Romansh-Sursilvan dialect, fine-tuned based on the wav2vec2-xls-r-300m architecture, suitable for speech-to-text tasks.

Model Features

Low-resource language support

Specially optimized for the low-resource Romansh-Sursilvan dialect

High performance

Achieved a 24.09% Word Error Rate (WER) and 4.98% Character Error Rate (CER) on the Common Voice 8 test set

Based on XLS-R architecture

Uses Facebook's wav2vec2-xls-r-300m as the base model, with powerful speech feature extraction capabilities

Model Capabilities

Speech recognition

Speech-to-text

Romansh-Sursilvan dialect processing

Use Cases

Speech transcription

Romansh speech transcription

Convert speech content in the Romansh-Sursilvan dialect to text

Achieved 24.09% WER on the Common Voice 8 test set

Voice assistance technology

Romansh voice assistant

Develop voice-controlled applications for Romansh speakers

🚀 wav2vec2-xls-r-300m-rm-sursilv-d11

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - RM - SURSILV dataset. It is designed for automatic speech recognition tasks and aims to provide accurate speech - to - text conversion.

✨ Features

Language Support: Supports rm - sursilv language.
Datasets: Trained on mozilla - foundation/common_voice_8_0 dataset.
Metrics: Evaluated using WER (Word Error Rate) and CER (Character Error Rate).

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2 - xls - r - 300m - rm - sursilv - d11
Training Data	mozilla - foundation/common_voice_8_0
Metrics	WER, CER

Model Performance

This model achieves the following results on the evaluation set:

Loss: 0.2511
Wer: 0.2415

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-300m-rm-sursilv-d11 --dataset mozilla-foundation/common_voice_8_0 --config rm-sursilv --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Romansh - Sursilv language isn't available in speech - recognition - community - v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e - 05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 125.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.3958	17.44	1500	0.6808	0.6521
0.9663	34.88	3000	0.3023	0.3718
0.7963	52.33	4500	0.2588	0.3046
0.6893	69.77	6000	0.2436	0.2718
0.6148	87.21	7500	0.2521	0.2572
0.5556	104.65	9000	0.2490	0.2442
0.5258	122.09	10500	0.2515	0.2442

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご