The open-source model wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm - Accurately recognize Luxembourgish speech

Wav2vec2 Large Xlsr 53 842h Luxembourgish 14h With Lm

Developed by Lemswasabi

A Luxembourgish speech recognition model fine-tuned from the wav2vec 2.0 large XLSR-53 checkpoint, trained with 842 hours of unlabeled and 14 hours of labeled data, integrated with a 5-gram language model

Speech Recognition

Transformers

OtherOpen Source License:MIT #Luxembourgish Speech Recognition #Low Word Error Rate (WER)#Cross-lingual Pretraining

Downloads 170

Release Time : 5/24/2022

Model Overview

This model is an automatic speech recognition system for Luxembourgish, trained with large-scale unlabeled data and a small amount of labeled data, combined with a language model to improve recognition accuracy

Model Features

Cross-lingual Pretraining

Fine-tuned based on the XLSR-53 multilingual model, fully leveraging cross-lingual speech representations

Language Model Integration

Uses a 5-gram language model for output rescoring to improve recognition accuracy

Efficient Data Utilization

Combines 842 hours of unlabeled data and 14 hours of labeled data for training

Model Capabilities

Luxembourgish Speech Recognition

Audio to Text

Speech Transcription

Use Cases

Media Transcription

Broadcast Content Transcription

Transcribing Luxembourgish broadcast content such as RTL.lu

Word Error Rate 9.3%-9.5%

Voice Assistants

Luxembourgish Voice Interaction

Providing voice control features for Luxembourgish users

🚀 Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm

This model is fine - tuned for Luxembourgish automatic speech recognition, leveraging large - scale unlabelled data and a language model for better performance.

🚀 Quick Start

This README provides detailed information about the Lemswasabi/wav2vec2-large-xlsr-53-842h-luxembourgish-14h-with-lm model, including its description, training details, and evaluation metrics.

✨ Features

Fine - tuned on 842h of unlabelled Luxembourgish speech and 14h of labelled Luxembourgish speech.
Rescored output transcription with a 5 - gram language model.
Achieved low Word Error Rate (WER) and Character Error Rate (CER) on development and test sets.

📚 Documentation

Model description

We fine - tuned a wav2vec 2.0 large XLSR - 53 checkpoint with 842h of unlabelled Luxembourgish speech collected from RTL.lu. Then the model was fine - tuned on 14h of labelled Luxembourgish speech from the same domain. Additionally, we rescore the output transcription with a 5 - gram language model trained on text corpora from the same domain.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 3
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.11.0+cu113
Datasets 2.2.1
Tokenizers 0.12.1

Model Performance

Property	Details
Model Type	wav2vec 2.0 large XLSR - 53 fine - tuned for Luxembourgish ASR
Dev WER	9.50
Test WER	9.30
Dev CER	2.17
Test CER	2.08

📄 License

This model is released under the MIT license.

🔧 Technical Details

Citation

This model is a result of our paper IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS - LINGUAL SPEECH REPRESENTATIONS submitted to the IEEE SLT 2022 workshop

@misc{lb-wav2vec2,
  author = {Nguyen, Le Minh and Nayak, Shekhar and Coler, Matt.},
  keywords = {Luxembourgish, multilingual speech recognition, language modelling, wav2vec 2.0 XLSR-53, under-resourced language},
  title = {IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS},
  year = {2022},
  copyright = {2023 IEEE}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご