Wav2vec2-large-xls-r-300m-mongolian Open-source Speech Recognition Model - Precisely Identify Mongolian Speech Content

Wav2vec2 Large Xls R 300m Mongolian

Developed by infinitejoy

An automatic speech recognition model fine-tuned on Mongolian datasets based on facebook/wav2vec2-xls-r-300m

OtherOpen Source License:Apache-2.0 #Mongolian speech recognition #Low-resource language processing #Multi-dialect adaptation

Downloads 33

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition (ASR) model for Mongolian, based on the XLS-R architecture and fine-tuned on the Common Voice 7.0 Mongolian dataset.

Model Features

Mongolian optimization

Specifically optimized and fine-tuned for Mongolian speech recognition

Based on XLS-R architecture

Utilizes the powerful XLS-R 300M parameter architecture with excellent speech recognition capabilities

Multi-dataset evaluation

Evaluated on multiple datasets including Common Voice and robust speech events

Model Capabilities

Mongolian speech recognition

Speech-to-text

Conversational speech processing

Use Cases

Speech transcription

Mongolian speech-to-text

Convert Mongolian speech content into text

WER of 44.7% on the Common Voice test set

Voice assistants

Mongolian voice command recognition

Speech recognition component for Mongolian voice assistants or voice control systems

🚀 XLS-R-300M - Mongolian

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - MN dataset. It is designed for automatic speech recognition tasks, offering potential solutions for Mongolian speech processing.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - MN dataset. It achieves the following results on the evaluation set:

Loss: 0.6003
Wer: 0.4473

✨ Features

Automatic Speech Recognition: Specialized for Mongolian automatic speech recognition tasks.
Fine - Tuned: Based on the pre - trained facebook/wav2vec2-xls-r-300m model, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - MN dataset.

📊 Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-mongolian
Training Data	mozilla - foundation/common_voice_7_0
License	apache - 2.0
Tags	automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, mn, model_for_talk, mozilla - foundation/common_voice_7_0, robust - speech - event

📈 Model Results

Task	Dataset	Test WER	Test CER
Automatic Speech Recognition	Common Voice 7 (mozilla - foundation/common_voice_7_0, args: mn)	44.709	13.532
Automatic Speech Recognition	Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: mn)	76.643	36.997
Automatic Speech Recognition	Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data, args: mn)	78.45	N/A

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.3677	15.87	2000	0.6432	0.6198
1.1379	31.75	4000	0.6196	0.5592
1.0093	47.62	6000	0.5828	0.5117
0.8888	63.49	8000	0.5754	0.4822
0.7985	79.37	10000	0.5987	0.4690
0.697	95.24	12000	0.6014	0.4471

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

📄 License

This model is released under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご