whisper-large-v2-french Open-Source French Speech Recognition Model - Precise Recognition with Over 2,200 Hours of Audio Training

Whisper Large V2 French

Developed by bofenghuang

French speech recognition model fine-tuned from openai/whisper-large-v2, trained on over 2200 hours of French audio data

Speech Recognition

Transformers

FrenchOpen Source License:Apache-2.0 #French speech recognition #Low word error rate #Multi-dataset training

Downloads 103

Release Time : 1/11/2023

Model Overview

This model is optimized for French Automatic Speech Recognition (ASR) tasks, demonstrating excellent performance across multiple French speech datasets without predicting capitalization or punctuation.

Model Features

Multi-dataset training

Incorporates multiple high-quality French speech datasets including Common Voice 11.0, Multilingual LibriSpeech, and Voxpopuli

High performance

Significantly lower Word Error Rate (WER) than the base model across multiple test sets

Broad applicability

Supports recognition of both standard French and African-accented French

Model Capabilities

French speech-to-text conversion

High-accuracy speech recognition

Handling various French accents

Use Cases

Speech transcription

French meeting minutes

Convert French meeting recordings into text transcripts

Word error rate below 9%

French media content subtitling

Automatically generate subtitles for French videos

Approximately 5% word error rate on standard French content

Voice assistants

French voice command recognition

Used for voice command recognition in French voice assistants or smart home systems

Performs well with various accents

🚀 Fine-tuned whisper-large-v2 model for ASR in French

This model is a fine - tuned version of openai/whisper-large-v2, designed for automatic speech recognition in French. It's trained on a large composite dataset of French speech audio, offering high - quality ASR performance.

🚀 Quick Start

This fine - tuned whisper-large-v2 model is ready to use for automatic speech recognition in French. When using the model, ensure that your speech input is sampled at 16Khz. Note that this model doesn't predict casing or punctuation.

✨ Features

Fine - tuned: Based on the powerful openai/whisper-large-v2 model, fine - tuned on over 2200 hours of French speech audio.
Multidataset training: Trained on a composite dataset including Common Voice 11.0, Multilingual LibriSpeech, and others.
Low WER: Achieves low Word Error Rates (WER) on multiple French speech datasets.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Inference with 🤗 Pipeline

import torch

from datasets import load_dataset
from transformers import pipeline

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load pipeline
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-large-v2-french", device=device)

# NB: set forced_decoder_ids for generation utils
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

# Load data
ds_mcv_test = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
test_segment = next(iter(ds_mcv_test))
waveform = test_segment["audio"]

# Run
generated_sentences = pipe(waveform, max_new_tokens=225)["text"]  # greedy
# generated_sentences = pipe(waveform, max_new_tokens=225, generate_kwargs={"num_beams": 5})["text"]  # beam search

# Normalise predicted sentences if necessary

Advanced Usage

Inference with 🤗 low - level APIs

import torch
import torchaudio

from datasets import load_dataset
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load model
model = AutoModelForSpeechSeq2Seq.from_pretrained("bofenghuang/whisper-large-v2-french").to(device)
processor = AutoProcessor.from_pretrained("bofenghuang/whisper-large-v2-french", language="french", task="transcribe")

# NB: set forced_decoder_ids for generation utils
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="fr", task="transcribe")

# 16_000
model_sample_rate = processor.feature_extractor.sampling_rate

# Load data
ds_mcv_test = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
test_segment = next(iter(ds_mcv_test))
waveform = torch.from_numpy(test_segment["audio"]["array"])
sample_rate = test_segment["audio"]["sampling_rate"]

# Resample
if sample_rate != model_sample_rate:
    resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
    waveform = resampler(waveform)

# Get feat
inputs = processor(waveform, sampling_rate=model_sample_rate, return_tensors="pt")
input_features = inputs.input_features
input_features = input_features.to(device)

# Generate
generated_ids = model.generate(inputs=input_features, max_new_tokens=225)  # greedy
# generated_ids = model.generate(inputs=input_features, max_new_tokens=225, num_beams=5)  # beam search

# Detokenize
generated_sentences = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Normalise predicted sentences if necessary

📚 Documentation

Performance

Pre - trained models' WER

Below are the WERs of the pre - trained models on the Common Voice 9.0, Multilingual LibriSpeech, Voxpopuli and Fleurs. These results are reported in the original paper.

Model	Common Voice 9.0	MLS	VoxPopuli	Fleurs
openai/whisper-small	22.7	16.2	15.7	15.0
openai/whisper-medium	16.0	8.9	12.2	8.7
openai/whisper-large	14.7	8.9	11.0	7.7
openai/whisper-large-v2	13.9	7.3	11.4	8.3

Fine - tuned models' WER

Below are the WERs of the fine - tuned models on the Common Voice 11.0, Multilingual LibriSpeech, Voxpopuli, and Fleurs. Note that these evaluation datasets have been filtered and preprocessed to only contain French alphabet characters and are removed of punctuation outside of apostrophe. The results in the table are reported as WER (greedy search) / WER (beam search with beam width 5).

Model	Common Voice 11.0	MLS	VoxPopuli	Fleurs
bofenghuang/whisper-small-cv11-french	11.76 / 10.99	9.65 / 8.91	14.45 / 13.66	10.76 / 9.83
bofenghuang/whisper-medium-cv11-french	9.03 / 8.54	6.34 / 5.86	11.64 / 11.35	7.13 / 6.85
bofenghuang/whisper-medium-french	9.03 / 8.73	4.60 / 4.44	9.53 / 9.46	6.33 / 5.94
bofenghuang/whisper-large-v2-cv11-french	8.05 / 7.67	5.56 / 5.28	11.50 / 10.69	5.42 / 5.05
bofenghuang/whisper-large-v2-french	8.15 / 7.83	4.20 / 4.03	9.10 / 8.66	5.22 / 4.98

📄 License

This model is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご