wav2vec2-xlsr-georgian Open-source Model - Support for Automatic Speech Recognition in Georgian

Wav2vec2 Xlsr Georgian

Developed by sammy786

This model is an automatic speech recognition model fine-tuned on Georgian language datasets based on facebook/wav2vec2-xls-r-1b

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Georgian speech recognition #Multi-scenario speech transcription #Low word error rate

Downloads 19

Release Time : 3/2/2022

Model Overview

Georgian automatic speech recognition model based on the wav2vec2-xls-r-1b architecture, fine-tuned on the Common Voice 8 dataset

Model Features

High-performance speech recognition

Achieves a 23.9% word error rate (WER) on the Common Voice 8 Georgian test set

Large-scale pre-training

Fine-tuned based on the 1-billion-parameter wav2vec2-xls-r-1b model

Multi-scenario adaptation

Evaluated on dialogue and robust speech event datasets

Model Capabilities

Georgian speech-to-text

Automatic speech recognition

Conversational speech processing

Use Cases

Speech transcription

Georgian speech transcription

Convert Georgian speech into text

23.9% WER on the Common Voice test set

Dialogue systems

Georgian dialogue processing

Process Georgian conversational speech

74.41% WER on the robust speech event test set

🚀 sammy786/wav2vec2-xlsr-georgian

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - ka dataset. It's designed for automatic speech recognition tasks and has achieved certain results on evaluation sets.

✨ Features

This model is fine - tuned for the Georgian language on the Common Voice 8.0 dataset.
It can be used for automatic speech recognition tasks, with specific performance metrics on different datasets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model description

"facebook/wav2vec2-xls-r-1b" was finetuned.

Intended uses & limitations

More information needed

Training and evaluation data

Training data - Common voice Finnish train.tsv, dev.tsv and other.tsv

Training procedure

For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000045637994662983496
train_batch_size: 8
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
200	4.152100	0.823672	0.967814
400	0.889500	0.196740	0.444792
600	0.493700	0.155659	0.366115
800	0.328000	0.138066	0.358069
1000	0.260600	0.119236	0.324989
1200	0.217200	0.114050	0.313366
1400	0.188800	0.112600	0.302190
1600	0.166900	0.111154	0.295485
1800	0.155500	0.109963	0.286544
2000	0.140400	0.107587	0.277604
2200	0.142600	0.105662	0.277157
2400	0.135400	0.105414	0.275369

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.17.1.dev0
Tokenizers 0.10.3

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id sammy786/wav2vec2-xlsr-georgian --dataset mozilla-foundation/common_voice_8_0 --config ka --split test

🔧 Technical Details

The model is fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - ka dataset. During training, all possible datasets were combined, and a 90 - 10 split was used to create the training and evaluation sets. Specific hyperparameters were set for the training process, and the model's performance was evaluated on different datasets, with metrics such as WER and CER reported.

📄 License

The model is released under the Apache 2.0 license.

📊 Model Index

Property	Details
Model Name	sammy786/wav2vec2-xlsr-czech
Task	Automatic Speech Recognition
Datasets	- Common Voice 8 (mozilla - foundation/common_voice_8_0 - ka) - Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data - ka) - Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data - ka)
Metrics	- On Common Voice 8: Test WER = 23.9, Test CER = 3.59 - On Robust Speech Event - Dev Data: Test WER = 75.07 - On Robust Speech Event - Test Data: Test WER = 74.41

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご