wav2vec2-large-xls-r-300m-myv-v1 Open-source Model - Achieving Precise Speech Recognition of the Erzya Language

Wav2vec2 Large Xls R 300m Myv V1

Developed by DrishtiSharma

This is an Erzya speech recognition model fine-tuned on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - MYV dataset, based on facebook/wav2vec2-xls-r-300m.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Erzya speech recognition #Multi-dialect robustness #Low-resource language optimization

Downloads 27

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model for the Erzya language, fine-tuned on the Common Voice 8 dataset.

Model Features

Multilingual support

Based on XLS-R architecture, supports cross-language speech recognition

Efficient fine-tuning

Optimized on the Erzya dataset from Common Voice 8

Robust performance

Achieves CER 0.13 and WER 0.6 on test sets

Model Capabilities

Erzya speech recognition

Automatic speech-to-text

Cross-language speech processing

Use Cases

Speech technology

Erzya voice assistant

Develop voice interaction applications for Erzya speakers

Speech transcription service

Convert Erzya speech content into text

🚀 wav2vec2-large-xls-r-300m-myv-v1

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - MYV dataset. It's designed for automatic speech recognition, offering a solution to transcribe speech accurately.

📚 Documentation

Model Information

Property	Details
Language	myv
License	apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, myv, robust - speech - event, model_for_talk, hf - asr - leaderboard
Datasets	mozilla - foundation/common_voice_8_0

Model Index

The model wav2vec2-large-xls-r-300m-myv-v1 has the following results:

Task: Automatic Speech Recognition
- Dataset: Common Voice 8 (mozilla - foundation/common_voice_8_0 with myv args)
  - Metrics:
    - Test WER: 0.599548532731377
    - Test CER: 0.12953851902597
- Dataset: Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data with myv args)
  - Metrics:
    - Test WER: NA
    - Test CER: NA

Evaluation Results

It achieves the following results on the evaluation set:

Loss: 0.8537
Wer: 0.6160

Evaluation Commands

Basic Usage

To evaluate on mozilla - foundation/common_voice_8_0 with test split:

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-myv-v1 --dataset mozilla-foundation/common_voice_8_0 --config myv --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data:

Erzya language not found in speech-recognition-community-v2/dev_data!

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000222
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 150
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
19.453	1.92	50	16.4001	1.0
9.6875	3.85	100	5.4468	1.0
4.9988	5.77	150	4.3507	1.0
4.1148	7.69	200	3.6753	1.0
3.4922	9.62	250	3.3103	1.0
3.2443	11.54	300	3.1741	1.0
3.164	13.46	350	3.1346	1.0
3.0954	15.38	400	3.0428	1.0
3.0076	17.31	450	2.9137	1.0
2.6883	19.23	500	2.1476	0.9978
1.5124	21.15	550	0.8955	0.8225
0.8711	23.08	600	0.6948	0.7591
0.6695	25.0	650	0.6683	0.7636
0.5606	26.92	700	0.6821	0.7435
0.503	28.85	750	0.7220	0.7516
0.4528	30.77	800	0.6638	0.7324
0.4219	32.69	850	0.7120	0.7435
0.4109	34.62	900	0.7122	0.7511
0.3887	36.54	950	0.7179	0.7199
0.3895	38.46	1000	0.7322	0.7525
0.391	40.38	1050	0.6850	0.7364
0.3537	42.31	1100	0.7571	0.7279
0.3267	44.23	1150	0.7575	0.7257
0.3195	46.15	1200	0.7580	0.6998
0.2891	48.08	1250	0.7452	0.7101
0.294	50.0	1300	0.7316	0.6945
0.2854	51.92	1350	0.7241	0.6757
0.2801	53.85	1400	0.7532	0.6887
0.2502	55.77	1450	0.7587	0.6811
0.2427	57.69	1500	0.7231	0.6851
0.2311	59.62	1550	0.7288	0.6632
0.2176	61.54	1600	0.7711	0.6664
0.2117	63.46	1650	0.7914	0.6940
0.2114	65.38	1700	0.8065	0.6918
0.1913	67.31	1750	0.8372	0.6945
0.1897	69.23	1800	0.8051	0.6869
0.1865	71.15	1850	0.8076	0.6740
0.1844	73.08	1900	0.7935	0.6708
0.1757	75.0	1950	0.8015	0.6610
0.1636	76.92	2000	0.7614	0.6414
0.1637	78.85	2050	0.8123	0.6592
0.1599	80.77	2100	0.7907	0.6566
0.1498	82.69	2150	0.8641	0.6757
0.1545	84.62	2200	0.7438	0.6682
0.1433	86.54	2250	0.8014	0.6624
0.1427	88.46	2300	0.7758	0.6646
0.1423	90.38	2350	0.7741	0.6423
0.1298	92.31	2400	0.7938	0.6414
0.1111	94.23	2450	0.7976	0.6467
0.1243	96.15	2500	0.7916	0.6481
0.1215	98.08	2550	0.7594	0.6392
0.113	100.0	2600	0.8236	0.6392
0.1077	101.92	2650	0.7959	0.6347
0.0988	103.85	2700	0.8189	0.6392
0.0953	105.77	2750	0.8157	0.6414
0.0889	107.69	2800	0.7946	0.6369
0.0929	109.62	2850	0.8255	0.6360
0.0822	111.54	2900	0.8320	0.6334
0.086	113.46	2950	0.8539	0.6490
0.0825	115.38	3000	0.8438	0.6418
0.0727	117.31	3050	0.8568	0.6481
0.0717	119.23	3100	0.8447	0.6512
0.0815	121.15	3150	0.8470	0.6445
0.0689	123.08	3200	0.8264	0.6249
0.0726	125.0	3250	0.7981	0.6169
0.0648	126.92	3300	0.8237	0.6200
0.0632	128.85	3350	0.8416	0.6249
0.06	130.77	3400	0.8276	0.6173
0.0616	132.69	3450	0.8429	0.6209
0.0614	134.62	3500	0.8485	0.6271
0.0539	136.54	3550	0.8598	0.6218
0.0555	138.46	3600	0.8557	0.6169
0.0604	140.38	3650	0.8436	0.6186
0.0556	142.31	3700	0.8428	0.6178
0.051	144.23	3750	0.8440	0.6142
0.0526	146.15	3800	0.8566	0.6142
0.052	148.08	3850	0.8544	0.6178
0.0519	150.0	3900	0.8537	0.6160

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.2
Tokenizers 0.11.0

📄 License

This model is released under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご