Open-source Automatic Speech Recognition Model wav2vec2 - Fine-tuned for Precise Recognition Based on Luzhou Language Speech Data

Wav2vec2 Large Xlsr 53 842h Luxembourgish 4h

Developed by Lemswasabi

An automatic speech recognition model fine-tuned with 842 hours of unlabeled and 4 hours of labeled Luxembourgish speech data

Speech Recognition

Transformers

OtherOpen Source License:MIT #Luxembourgish speech recognition #Low-resource language optimization #Cross-lingual pre-training

Downloads 16

Release Time : 3/2/2022

Model Overview

This model is a Luxembourgish speech recognition model based on the wav2vec 2.0 large XLSR-53 architecture, pre-trained on 842 hours of unlabeled data and fine-tuned on 4 hours of labeled data.

Model Features

Cross-lingual speech representation

Utilizes the XLSR-53 multilingual pre-trained model as a foundation to effectively handle the low-resource Luxembourgish language

Efficient data utilization

Achieves good recognition performance using only 4 hours of labeled data

Two-stage training

Pre-trained on large-scale unlabeled data first, then fine-tuned on small-scale labeled data

Model Capabilities

Luxembourgish speech recognition

Speech-to-text

Use Cases

Speech transcription

Luxembourgish media content transcription

Automatically transcribes Luxembourgish radio and TV programs into text

Word error rate 18.77%

Voice assistant

Luxembourgish voice interaction

Develops localized voice assistants for the Luxembourg region

🚀 Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-4h

This model is fine - tuned for Luxembourgish automatic speech recognition, leveraging a large amount of unlabelled and some labelled Luxembourgish speech data.

🚀 Quick Start

This README provides detailed information about the Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-4h model for automatic speech recognition.

✨ Features

Fine - Tuned for Luxembourgish: The model is fine - tuned using a large amount of unlabelled Luxembourgish speech (842h) and 4h of labelled Luxembourgish speech from the same domain.
Multiple Metrics: Evaluated using Word Error Rate (WER) and Character Error Rate (CER) on development and test sets.

📚 Documentation

Model description

We fine - tuned a wav2vec 2.0 large XLSR - 53 checkpoint with 842h of unlabelled Luxembourgish speech collected from RTL.lu. Then the model was fine - tuned on 4h of labelled Luxembourgish speech from the same domain.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 3
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.11.0+cu113
Datasets 2.2.1
Tokenizers 0.12.1

Model performance

Property	Details
Dev WER	19.44
Test WER	18.77
Dev CER	7.16
Test CER	6.43

Citation

This model is a result of our paper IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS - LINGUAL SPEECH REPRESENTATIONS submitted to the IEEE SLT 2022 workshop

@misc{lb-wav2vec2,
  author = {Nguyen, Le Minh and Nayak, Shekhar and Coler, Matt.},
  keywords = {Luxembourgish, multilingual speech recognition, language modelling, wav2vec 2.0 XLSR-53, under-resourced language},
  title = {IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS},
  year = {2022},
  copyright = {2023 IEEE}
}

📄 License

This model is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご