The open-source French automatic speech recognition model xls-r-1b-cv_8-fr accurately identifies French speech content.

Xls R 1b Cv 8 Fr

Developed by Plim

This is a French automatic speech recognition model fine-tuned on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - FR dataset based on facebook/wav2vec2-xls-r-1b.

Speech Recognition

Transformers

FrenchOpen Source License:Apache-2.0 #French Speech Recognition #High Precision WER #Multi-scenario Robustness

Downloads 26

Release Time : 3/2/2022

Model Overview

This model is specifically designed for French speech recognition tasks, excelling on the Common Voice 8 French dataset, supporting high-accuracy speech-to-text conversion.

Model Features

High-performance French Speech Recognition

Achieves 15.4% WER (with language model) on the Common Voice 8 French test set.

Large-scale Pre-trained Model Fine-tuning

Fine-tuned based on the 1-billion-parameter wav2vec2-xls-r model.

Multi-scenario Adaptability

Also demonstrates good recognition capabilities on robust speech event datasets.

Model Capabilities

French Speech Recognition

High-accuracy Speech-to-Text

Handling different accents and speech qualities

Use Cases

Speech Transcription

French Speech to Text

Convert French speech content into text transcripts

Achieves 15.4% WER on the Common Voice 8 test set

Voice Assistants

French Voice Command Recognition

Recognize and understand French voice commands

🚀 XLS-R-1B - French

This is a fine - tuned model for automatic speech recognition, leveraging the power of the pre - trained facebook/wav2vec2-xls-r-1b on the French dataset MOZILLA - FOUNDATION/COMMON_VOICE_8_0. It offers high - quality speech recognition capabilities with excellent performance metrics.

✨ Features

Automatic Speech Recognition: Specialized for French speech recognition tasks.
Fine - Tuned on Quality Data: Trained on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 French dataset.
Multiple Evaluation Metrics: Evaluated using WER and CER, both with and without a language model.

📦 Installation

No installation steps provided in the original document, so this section is skipped.

💻 Usage Examples

No usage code examples provided in the original document, so this section is skipped.

📚 Documentation

Model description

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - FR dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 6.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.9827	0.29	1000	inf	0.2937
1.0203	0.57	2000	inf	0.2711
1.0048	0.86	3000	inf	0.2620
0.9858	1.15	4000	inf	0.2522
0.9709	1.43	5000	inf	0.2365
0.9347	1.72	6000	inf	0.2332
0.9256	2.01	7000	inf	0.2261
0.8936	2.29	8000	inf	0.2203
0.877	2.58	9000	inf	0.2096
0.8393	2.87	10000	inf	0.2017
0.8156	3.15	11000	inf	0.1936
0.8015	3.44	12000	inf	0.1880
0.774	3.73	13000	inf	0.1834
0.8372	4.01	14000	inf	0.1934
0.8075	4.3	15000	inf	0.1923
0.8069	4.59	16000	inf	0.1877
0.8064	4.87	17000	inf	0.1955
0.801	5.16	18000	inf	0.1891
0.8022	5.45	19000	inf	0.1895
0.792	5.73	20000	inf	0.1854

It achieves the best result on the validation set on STEP 13000:

Wer: 0.1834

Some problem occurs when calculating the validation loss.

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3.dev0
Tokenizers 0.11.0

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8 with split test

python eval.py --model_id Plim/xls - r - 1b - cv_8 - fr --dataset mozilla - foundation/common_voice_8_0 --config fr --split test

To evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id Plim/xls - r - 1b - cv_8 - fr --dataset speech - recognition - community - v2/dev_data --config fr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Evaluation Results

Without LM:

Dataset	WER	CER
TEST CV	18.33	5.60
DEV audio	31.33	13.20
TEST audio	/	/

With LM:

Dataset	WER	CER
TEST CV	15.40	5.36
DEV audio	25.05	12.45
TEST audio	/	/

🔧 Technical Details

Model Index

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-1b for French automatic speech recognition
Training Data	MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - FR dataset

Task Results

The model has been evaluated on multiple tasks with different datasets:

Automatic Speech Recognition on Common Voice 8:
- Test WER (with LM): 15.4
- Test CER (with LM): 5.36
Automatic Speech Recognition on Robust Speech Event - Dev Data:
- Test WER (with LM): 25.05
- Test CER (with LM): 12.45
Automatic Speech Recognition on Robust Speech Event - Test Data:
- Test WER: 27.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご