Open-source French Speech Recognition Model: wav2vec2-xls-r-1b-french - Accurately Recognize French Speech Content

Wav2vec2 Xls R 1b French

Developed by jonatasgrosman

This is a French automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple French speech datasets.

Speech Recognition

Transformers

FrenchOpen Source License:Apache-2.0 #French speech recognition #Multi-dataset training #XLS-R 1B architecture

Downloads 379

Release Time : 3/2/2022

Model Overview

This model is specifically designed for French speech recognition tasks, fine-tuned from Facebook's wav2vec2-xls-r-1b model, and supports voice input with a 16kHz sampling rate.

Model Features

Multi-dataset training

The model was trained and validated on multiple French speech datasets including Common Voice 8.0, MediaSpeech, and Multilingual TEDx.

High performance

Achieved 16.85% WER and 4.66% CER on the Common Voice 8.0 test set.

Supports language model

The model can be used in combination with a language model to further improve recognition accuracy.

Model Capabilities

French speech recognition

16kHz audio processing

Automatic speech-to-text

Use Cases

Speech transcription

French speech-to-text

Convert French speech content into text format

16.85% WER on the Common Voice 8.0 test set

Voice assistants

French voice command recognition

Recognize and understand French voice commands

🚀 Fine-tuned XLS-R 1B model for speech recognition in French

This is a fine-tuned model based on facebook/wav2vec2-xls-r-1b for French speech recognition, which provides high - quality speech recognition capabilities.

🚀 Quick Start

This model is fine - tuned on French using the train and validation splits of Common Voice 8.0, MediaSpeech, Multilingual TEDx, Multilingual LibriSpeech, and Voxpopuli. When using this model, make sure that your speech input is sampled at 16kHz.

This model has been fine - tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud.

✨ Features

Language Support: Specifically designed for French speech recognition.
Fine - Tuned Data: Utilizes multiple high - quality datasets for fine - tuning.
Tools and Credits: Fine - tuned with HuggingSound and supported by OVHcloud GPU credits.

💻 Usage Examples

Basic Usage

Using the HuggingSound library:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-french")
audio_paths = ["/path/to/file.mp3", "/path/to/another_file.wav"]

transcriptions = model.transcribe(audio_paths)

Advanced Usage

Writing your own inference script:

import torch
import librosa
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

LANG_ID = "fr"
MODEL_ID = "jonatasgrosman/wav2vec2-xls-r-1b-french"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")

processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentences = processor.batch_decode(predicted_ids)

📚 Documentation

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test

python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-french --dataset mozilla-foundation/common_voice_8_0 --config fr --split test

To evaluate on speech-recognition-community-v2/dev_data

python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-french --dataset speech-recognition-community-v2/dev_data --config fr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Model Information

Property	Details
Model Type	Fine - tuned XLS - R 1B model for French speech recognition
Training Data	mozilla - foundation/common_voice_8_0, MediaSpeech, Multilingual TEDx, Multilingual LibriSpeech, Voxpopuli

Results

The model named "XLS - R Wav2Vec2 French by Jonatas Grosman" has the following performance metrics:

Dataset	Task	Metric	Value
Common Voice 8	Automatic Speech Recognition	Test WER	16.85
Common Voice 8	Automatic Speech Recognition	Test CER	4.66
Common Voice 8	Automatic Speech Recognition	Test WER (+LM)	16.32
Common Voice 8	Automatic Speech Recognition	Test CER (+LM)	4.21
Robust Speech Event - Dev Data	Automatic Speech Recognition	Dev WER	22.34
Robust Speech Event - Dev Data	Automatic Speech Recognition	Dev CER	9.88
Robust Speech Event - Dev Data	Automatic Speech Recognition	Dev WER (+LM)	17.16
Robust Speech Event - Dev Data	Automatic Speech Recognition	Dev CER (+LM)	9.38
Robust Speech Event - Test Data	Automatic Speech Recognition	Test WER	19.15

📄 License

This model is under the Apache - 2.0 license.

📚 Citation

If you want to cite this model you can use this:

@misc{grosman2021xlsr-1b-french,
  title={Fine-tuned {XLS-R} 1{B} model for speech recognition in {F}rench},
  author={Grosman, Jonatas},
  howpublished={\url{https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-french}},
  year={2022}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご