Open-source wav2vec2-large-xls-r-300m-latvian Speech Recognition Model

Wav2vec2 Large Xls R 300m Latvian

Developed by infinitejoy

This is an automatic speech recognition model fine-tuned on Latvian datasets based on facebook/wav2vec2-xls-r-300m, achieving a WER of 16.98% on the Common Voice 7 test set.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Latvian speech recognition #Multi-scenario speech transcription #Low CER high accuracy

Downloads 222

Release Time : 3/2/2022

Model Overview

This pre-trained model is designed for Latvian automatic speech recognition (ASR), fine-tuned based on the XLS-R architecture, suitable for speech-to-text tasks.

Model Features

Multilingual pre-training

Fine-tuned based on the XLS-R-300M multilingual model, inheriting powerful cross-lingual speech representation capabilities

Efficient speech recognition

Achieves a WER of 16.98% on the Common Voice 7 Latvian test set, demonstrating excellent performance

Robustness training

Tested on robust speech event datasets, validating the model's adaptability in various scenarios

Model Capabilities

Latvian speech recognition

Speech-to-text

Conversational speech processing

Use Cases

Speech transcription

Voice memo to text

Convert Latvian voice memos into editable text

Accuracy can exceed 83% under clear speech conditions

Voice assistants

Latvian voice command recognition

Basic speech recognition component for localized voice assistants

Performs well on standard test sets

🚀 wav2vec2-large-xls-r-300m-latvian

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m for automatic speech recognition in Latvian, offering high - quality speech recognition capabilities.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - LV dataset. It achieves the following results on the evaluation set:

Loss: 0.1892
Wer: 0.1698

✨ Features

Multilingual Adaptability: Based on the pre - trained model facebook/wav2vec2-xls-r-300m, it can adapt to different language environments.
High - precision Recognition: Achieved low WER and CER on multiple datasets, demonstrating high - precision speech recognition capabilities.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of `facebook/wav2vec2-xls-r-300m`
Training Data	`mozilla - foundation/common_voice_7_0`
Tags	automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, lv, model_for_talk, mozilla - foundation/common_voice_7_0, robust - speech - event
License	apache - 2.0

Model Performance

The model has been evaluated on multiple datasets, and the results are as follows:

Common Voice 7:
- Test WER: 16.977
- Test CER: 4.23
Robust Speech Event - Dev Data:
- Test WER: 45.247
- Test CER: 16.924
Robust Speech Event - Test Data:
- Test WER: 56.16

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e - 05
train_batch_size: 32
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
1.4235	12.82	2000	0.4475	0.4551
0.9383	25.64	4000	0.2235	0.2328
0.8359	38.46	6000	0.2004	0.2098
0.7633	51.28	8000	0.1960	0.1882
0.7001	64.1	10000	0.1902	0.1809
0.652	76.92	12000	0.1979	0.1775
0.6025	89.74	14000	0.1866	0.1696

Framework Versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

🔧 Technical Details

The model is based on the pre - trained model facebook/wav2vec2-xls-r-300m and fine - tuned on the mozilla - foundation/common_voice_7_0 dataset. During the training process, a series of hyperparameters were used to adjust the model, such as learning rate, batch size, etc. Through multiple epochs of training, the model gradually converges and achieves good performance on the evaluation set.

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご