Open-source Speech Recognition Model wav2vec2-large-xls-r-300m-el - Accurately Recognize Greek Speech Content

Wav2vec2 Large Xls R 300m El

Developed by ayameRushia

This is an automatic speech recognition model fine-tuned on the Greek Common Voice 8 dataset, based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Greek speech recognition #Low word error rate #Multilingual model fine-tuning

Downloads 26

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Greek automatic speech recognition tasks and has achieved good recognition accuracy on the Common Voice 8 Greek dataset.

Model Features

Greek speech recognition

Speech recognition capability specifically optimized for Greek

Based on XLS-R architecture

Uses Facebook's wav2vec2-xls-r-300m pre-trained model as the foundation

Language model support

Supports the use of language models to improve recognition accuracy

Model Capabilities

Greek speech-to-text

Speech recognition

Supports language model decoding

Use Cases

Speech transcription

Greek speech transcription

Convert Greek speech content into text

Test set WER 20.73%, CER 6.05%

Voice assistants

Greek voice command recognition

Used for command recognition in Greek voice assistants

🚀 wav2vec2-large-xls-r-300m-el

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m for Automatic Speech Recognition on the Greek language dataset.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - EL dataset. It achieves the following results on the evaluation set:

Loss: 0.3218
Wer: 0.3095

✨ Features

Automatic Speech Recognition: Specialized for Greek language speech recognition.
Fine - tuned on Common Voice 8: Trained on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - EL dataset.

📦 Installation

There is no specific installation steps provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use eval.py:

huggingface-cli login #login to huggingface for getting auth token to access the common voice v8
#running with LM
!python eval.py --model_id ayameRushia/wav2vec2-large-xls-r-300m-el --dataset mozilla-foundation/common_voice_8_0 --config el --split test
# running without LM
!python eval.py --model_id ayameRushia/wav2vec2-large-xls-r-300m-el --dataset mozilla-foundation/common_voice_8_0 --config el --split test --greedy

📚 Documentation

Training and evaluation data

Evaluation is conducted in Notebook, you can see within the repo "notebook_evaluation_wav2vec2_el.ipynb".

Test WER without LM: wer = 31.1294 % cer = 7.9509 %

Test WER using LM: wer = 20.7340 % cer = 6.0466 %

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 32
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
num_epochs: 80.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
6.3683	8.77	500	3.1280	1.0
1.9915	17.54	1000	0.6600	0.6444
0.6565	26.32	1500	0.4208	0.4486
0.4484	35.09	2000	0.3885	0.4006
0.3573	43.86	2500	0.3548	0.3626
0.3063	52.63	3000	0.3375	0.3430
0.2751	61.4	3500	0.3359	0.3241
0.2511	70.18	4000	0.3222	0.3108
0.2361	78.95	4500	0.3205	0.3084

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache 2.0 license.

📋 Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-el
Training Data	mozilla-foundation/common_voice_8_0
Task	Automatic Speech Recognition
Test WER using LM	20.9
Test CER using LM	6.0466

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご