wav2vec2-large-xlsr-53-842h-luxembourgish-14h Open Source Model - Free Deployment for Luxembourgish Speech Recognition

Wav2vec2 Large Xlsr 53 842h Luxembourgish 14h

Developed by Lemswasabi

A large wav2vec2.0 model fine-tuned with 842 hours of unlabeled and 14 hours of labeled Luxembourgish speech data, supporting Luxembourgish speech recognition

Speech Recognition

Transformers

OtherOpen Source License:MIT #Luxembourgish speech recognition #Cross-lingual pretraining #Low word error rate

Downloads 204

Release Time : 5/21/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Luxembourgish, based on Facebook's wav2vec2.0 large XLSR-53 architecture. It was pre-trained on 842 hours of unlabeled data and fine-tuned on 14 hours of labeled data, with an integrated language model.

Model Features

Cross-lingual pretraining

Based on the XLSR-53 multilingual model, leveraging cross-lingual representations to enhance Luxembourgish recognition performance

Large-scale data training

Trained using 842 hours of unlabeled and 14 hours of labeled Luxembourgish data

Integrated language model

The model incorporates a language model (LM) to improve recognition accuracy

Low word error rate

Achieves a WER of 10.71% and a CER of 2.31% on the test set

Model Capabilities

Luxembourgish speech recognition

Audio-to-text conversion

Automatic speech transcription

Use Cases

Media transcription

Broadcast content transcription

Transcribing Luxembourgish broadcast content such as RTL.lu

Voice assistants

Luxembourgish voice interaction

Providing recognition capabilities for Luxembourgish voice assistants

🚀 Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm

This model is fine - tuned for automatic speech recognition of Luxembourgish, leveraging wav2vec 2.0 large XLSR - 53 checkpoint and achieving good performance on WER and CER metrics.

🚀 Quick Start

This model is designed for automatic speech recognition of Luxembourgish. It was fine - tuned on a large amount of Luxembourgish speech data.

📚 Documentation

🔍 Model description

We fine - tuned a wav2vec 2.0 large XLSR - 53 checkpoint with 842h of unlabelled Luxembourgish speech collected from RTL.lu. Then the model was fine - tuned on 14h of labelled Luxembourgish speech from the same domain.

📊 Model Index

Property	Details
Model Name	Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm
Task Type	automatic-speech-recognition
Dev WER	11.68
Test WER	10.71
Dev CER	2.64
Test CER	2.31

⚙️ Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 3
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.11.0+cu113
Datasets 2.2.1
Tokenizers 0.12.1

📄 License

This model is released under the MIT license.

📖 Citation

This model is a result of our paper IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS - LINGUAL SPEECH REPRESENTATIONS submitted to the IEEE SLT 2022 workshop

@misc{lb-wav2vec2,
  author = {Nguyen, Le Minh and Nayak, Shekhar and Coler, Matt.},
  keywords = {Luxembourgish, multilingual speech recognition, language modelling, wav2vec 2.0 XLSR - 53, under - resourced language},
  title = {IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS - LINGUAL SPEECH REPRESENTATIONS},
  year = {2022},
  copyright = {2023 IEEE}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご