Open-source Model of wav2vec2-large-xls-r-300m-romansh-sursilvan - Accurately Identify the Sursilvan Dialect of Romansh Language in Speech

Wav2vec2 Large Xls R 300m Romansh Sursilvan

Developed by infinitejoy

Automatic speech recognition model fine-tuned on the Romansh Sursilvan dialect dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Romansh speech recognition #Low word error rate #Multilingual support

Downloads 15

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model for the Romansh Sursilvan dialect, fine-tuned on the XLS-R-300M architecture, achieving a 19.81% word error rate (WER) on the Common Voice 7 dataset.

Model Features

Low word error rate

Achieved 19.81% WER and 4.15% CER on the Romansh Sursilvan dialect test set

Based on XLS-R architecture

Uses the powerful XLS-R-300M as the base model with excellent speech representation capabilities

Optimized for low-resource languages

Specifically optimized for relatively low-resource languages like the Romansh Sursilvan dialect

Model Capabilities

Speech-to-text

Romansh Sursilvan dialect recognition

Continuous speech recognition

Use Cases

Speech transcription

Romansh speech transcription

Convert speech content in the Romansh Sursilvan dialect to text

19.81% word error rate, 4.15% character error rate

Voice assistants

Romansh voice command recognition

For supporting voice assistants and smart devices in Romansh

🚀 XLS-R-300M - Romansh Sursilvan

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - RM - SURSILV dataset. It is designed for automatic speech recognition tasks, aiming to provide accurate speech - to - text conversion for Romansh Sursilvan.

🚀 Quick Start

This section will be added when there are actual quick - start steps.

✨ Features

Fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - RM - SURSILV dataset for better performance in Romansh Sursilvan speech recognition.
Achieves relatively low WER (Word Error Rate) and CER (Character Error Rate) on the evaluation set.

📦 Installation

This section will be added when there are actual installation steps.

💻 Usage Examples

This section will be added when there are actual code examples.

📚 Documentation

Model Performance

It achieves the following results on the evaluation set:

Loss: 0.2163
Wer: 0.1981

Model Index

Name: XLS - R - 300M - Romansh Sursilvan
Results:
- Task:
  - Name: Automatic Speech Recognition
  - Type: automatic - speech - recognition
- Dataset:
  - Name: Common Voice 7
  - Type: mozilla - foundation/common_voice_7_0
  - Args: rm - sursilv
- Metrics:
  - Name: Test WER
  - Type: wer
  - Value: 19.816
  - Name: Test CER
  - Type: cer
  - Value: 4.153

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e - 05
train_batch_size: 32
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 120.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.1004	23.81	2000	0.3710	0.4191
0.7002	47.62	4000	0.2342	0.2562
0.5573	71.43	6000	0.2175	0.2177
0.4799	95.24	8000	0.2109	0.1987
0.4511	119.05	10000	0.2164	0.1975

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご