Wav2Vec2-XLSR Estonian Open-Source Speech Recognition Model

Wav2vec2 Xlsr Estonian

Developed by sammy786

This is an automatic speech recognition model fine-tuned on Estonian datasets based on the facebook/wav2vec2-xls-r-1b model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Estonian speech recognition #XLS-R fine-tuned model #Multi-scenario speech transcription

Downloads 21

Release Time : 3/2/2022

Model Overview

This model is optimized for Estonian automatic speech recognition (ASR) tasks, trained on the Mozilla Common Voice 8.0 dataset.

Model Features

High-performance speech recognition

Achieves 23.61% WER and 4.6% CER on the Common Voice test set

Fine-tuned large-scale pre-trained model

Fine-tuned based on the 1-billion-parameter wav2vec2-xls-r-1b model

Multi-scenario adaptability

Evaluated on both standard speech and robust speech event datasets

Model Capabilities

Estonian speech recognition

Conversational speech-to-text

Robust speech processing

Use Cases

Speech transcription

Voice assistant

Used for developing Estonian voice assistants

Meeting minutes

Automatically transcribe Estonian meeting content into text

Speech analysis

Speech content analysis

Analyze Estonian speech content

🚀 sammy786/wav2vec2-xlsr-estonian

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - et dataset. It offers a solution for automatic speech recognition tasks, achieving specific results on evaluation sets.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - et dataset. It achieves the following results on the evaluation set (which is 10 percent of the train dataset merged with other and dev datasets):

Loss: 17.94
Wer: 30.38

✨ Features

Fine - Tuned Model: Based on facebook/wav2vec2-xls-r-1b, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - et dataset.
Multiple Datasets Support: Can be evaluated on different datasets like mozilla - foundation/common_voice_8_0 and speech - recognition - community - v2 datasets.

📚 Documentation

Model description

"facebook/wav2vec2-xls-r-1b" was finetuned.

Intended uses & limitations

More information needed

Training and evaluation data

Training data - Common voice Finnish train.tsv, dev.tsv and other.tsv

Training procedure

For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000045637994662983496
train_batch_size: 8
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
200	3.729100	1.096018	0.959867
400	0.996900	0.310228	0.443600
600	0.762900	0.210873	0.346117
800	0.621400	0.200381	0.331513
1000	0.408000	0.196382	0.322014
1200	0.320200	0.176281	0.312515
1400	0.315300	0.179433	0.303847
1600	0.445800	0.420985	0.315839
1800	0.644600	0.433833	0.354904
2000	0.550900	0.327117	0.336500
2200	0.498600	0.289830	0.325457
2400	0.488300	0.294309	0.314177
2600	0.491700	0.311175	0.318689
2800	0.508500	0.314744	0.320470
3000	0.499900	0.314834	0.320589

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.17.1.dev0
Tokenizers 0.10.3

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id sammy786/wav2vec2-xlsr-estonian --dataset mozilla - foundation/common_voice_8_0 --config et --split test

🔧 Technical Details

The model is based on the pre - trained "facebook/wav2vec2-xls-r-1b" and fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - et dataset. During training, various hyperparameters are carefully selected to optimize the performance. A 90 - 10 split of the datasets is used for training and evaluation.

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-1b on MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - et dataset
Training Data	Common voice Finnish train.tsv, dev.tsv and other.tsv

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご