Wav2vec2-xls-r-1b-dutch Open-source Dutch Speech Recognition Model

Wav2vec2 Xls R 1b Dutch

Developed by jonatasgrosman

This is a Dutch automatic speech recognition (ASR) model fine-tuned based on the XLS-R 1 billion parameter model, trained on multiple datasets including Common Voice 8.0, supporting 16kHz sampling rate audio input.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Dutch speech recognition #Low Word Error Rate (WER)#Large parameter model (1B)

Downloads 146

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Dutch automatic speech recognition tasks, fine-tuned from Facebook's XLS-R 1B parameter model, and performs excellently on multiple Dutch speech datasets.

Model Features

High-performance Dutch recognition

Achieves 10.38% WER and 3.04% CER on the Common Voice 8.0 test set

Supports language model enhancement

When combined with a language model, WER can be reduced to 6.83% and CER to 2.31%

Multi-dataset training

Trained on multiple datasets including Common Voice 8.0, Multilingual LibriSpeech, and Voxpopuli

Model Capabilities

Dutch speech recognition

16kHz audio processing

High-accuracy transcription

Use Cases

Speech transcription

Dutch speech-to-text

Convert Dutch speech content into text

Achieves over 90% accuracy on standard test sets

Voice assistants

Dutch voice command recognition

Used for Dutch voice assistant or smart home device command recognition

🚀 Fine-tuned XLS-R 1B model for speech recognition in Dutch

This is a fine-tuned model for Dutch speech recognition. It is based on facebook/wav2vec2-xls-r-1b, fine-tuned on Dutch using the train and validation splits of Common Voice 8.0, Multilingual LibriSpeech, and Voxpopuli. When using this model, ensure that your speech input is sampled at 16kHz.

🚀 Quick Start

This model has been fine-tuned by the HuggingSound tool, thanks to the GPU credits generously provided by the OVHcloud.

✨ Features

Automatic Speech Recognition: Capable of accurately transcribing Dutch speech.
Fine-tuned on Multiple Datasets: Trained on Common Voice 8.0, Multilingual LibriSpeech, and Voxpopuli for better performance.

📦 Installation

There is no specific installation command provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Using the HuggingSound library:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-dutch")
audio_paths = ["/path/to/file.mp3", "/path/to/another_file.wav"]

transcriptions = model.transcribe(audio_paths)

Advanced Usage

Writing your own inference script:

import torch
import librosa
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

LANG_ID = "nl"
MODEL_ID = "jonatasgrosman/wav2vec2-xls-r-1b-dutch"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")

processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentences = processor.batch_decode(predicted_ids)

📚 Documentation

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test

python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-dutch --dataset mozilla-foundation/common_voice_8_0 --config nl --split test

To evaluate on speech-recognition-community-v2/dev_data

python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-dutch --dataset speech-recognition-community-v2/dev_data --config nl --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Model Information

Property	Details
Model Type	Fine-tuned XLS-R 1B for Dutch speech recognition
Training Data	mozilla-foundation/common_voice_8_0, Multilingual LibriSpeech, Voxpopuli

Results

Task	Dataset	Test WER	Test CER	Test WER (+LM)	Test CER (+LM)	Dev WER	Dev CER	Dev WER (+LM)	Dev CER (+LM)
Automatic Speech Recognition	Common Voice 8	10.38	3.04	6.83	2.31	-	-	-	-
Automatic Speech Recognition	Robust Speech Event - Dev Data	-	-	-	-	31.12	15.92	23.95	14.18
Automatic Speech Recognition	Robust Speech Event - Test Data	20.41	-	-	-	-	-	-	-

📄 License

This model is licensed under the Apache-2.0 license.

📚 Citation

If you want to cite this model you can use this:

@misc{grosman2021xlsr-1b-dutch,
  title={Fine-tuned {XLS-R} 1{B} model for speech recognition in {D}utch},
  author={Grosman, Jonatas},
  howpublished={\url{https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch}},
  year={2022}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご