whisper-small-cv11-french开源模型 - 支持法语语音识别及标点预测

首页

Whisper Small Cv11 French

由 bofenghuang 开发

基于openai/whisper-small微调的法语自动语音识别模型，训练数据为Common Voice 11.0法语数据集，支持大小写和标点符号预测。

语音识别

Transformers

法语开源协议:Apache-2.0 #法语语音识别 #多方言支持 #低WER

下载量 266

发布时间 : 1/5/2023

模型简介

该模型是专为法语语音识别优化的Whisper-small版本，在多个法语语音数据集上表现出色，适用于法语语音转文字任务。

模型特点

法语优化

专门针对法语语音识别进行微调，在法语数据集上表现优于原始Whisper-small模型

标点符号预测

能够预测大小写和标点符号，输出格式化的文本

多数据集支持

在Common Voice、MLS、VoxPopuli等多个法语语音数据集上表现良好

模型能力

法语语音识别

语音转文字

标点符号预测

使用案例

语音转录

法语会议记录

将法语会议录音自动转录为文字记录

WER(词错误率)10.99-14.45(根据数据集不同)

法语字幕生成

为法语视频内容自动生成字幕

语音助手

法语语音指令识别

用于法语语音助手中的语音指令识别

🚀 用于法语自动语音识别的微调版whisper-small模型

该模型是 openai/whisper-small 的微调版本，在 mozilla-foundation/common_voice_11_0 法语数据集上进行训练。使用该模型时，请确保语音输入的采样率为 16Khz。该模型还能预测大小写和标点符号。

✨ 主要特性

基于微调的 openai/whisper-small 模型，在法语数据集上训练。
支持预测大小写和标点符号。
可使用 🤗 Pipeline 或 🤗 底层 API 进行推理。

📦 安装指南

文档未提供具体安装步骤，可参考 transformers 库的官方安装指南进行安装。

💻 使用示例

基础用法

使用 🤗 Pipeline 进行推理：

import torch

from datasets import load_dataset
from transformers import pipeline

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 加载管道
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)

# 注意：为生成工具设置强制解码器 ID
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

# 加载数据
ds_mcv_test = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
test_segment = next(iter(ds_mcv_test))
waveform = test_segment["audio"]

# 运行
generated_sentences = pipe(waveform, max_new_tokens=225)["text"]  # 贪心搜索
# generated_sentences = pipe(waveform, max_new_tokens=225, generate_kwargs={"num_beams": 5})["text"]  # 束搜索

# 必要时对预测句子进行归一化

高级用法

使用 🤗 底层 API 进行推理：

import torch
import torchaudio

from datasets import load_dataset
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 加载模型
model = AutoModelForSpeechSeq2Seq.from_pretrained("bofenghuang/whisper-small-cv11-french").to(device)
processor = AutoProcessor.from_pretrained("bofenghuang/whisper-small-cv11-french", language="french", task="transcribe")

# 注意：为生成工具设置强制解码器 ID
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="fr", task="transcribe")

# 16_000
model_sample_rate = processor.feature_extractor.sampling_rate

# 加载数据
ds_mcv_test = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
test_segment = next(iter(ds_mcv_test))
waveform = torch.from_numpy(test_segment["audio"]["array"])
sample_rate = test_segment["audio"]["sampling_rate"]

# 重采样
if sample_rate != model_sample_rate:
    resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
    waveform = resampler(waveform)

# 获取特征
inputs = processor(waveform, sampling_rate=model_sample_rate, return_tensors="pt")
input_features = inputs.input_features
input_features = input_features.to(device)

# 生成
generated_ids = model.generate(inputs=input_features, max_new_tokens=225)  # 贪心搜索
# generated_ids = model.generate(inputs=input_features, max_new_tokens=225, num_beams=5)  # 束搜索

# 反序列化
generated_sentences = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

# 必要时对预测句子进行归一化

📚 详细文档

性能表现

预训练模型的字错率（WER）：以下是预训练模型在 Common Voice 9.0、Multilingual LibriSpeech、Voxpopuli 和 Fleurs 上的字错率（WER）。这些结果来自原始论文。

模型	Common Voice 9.0	MLS	VoxPopuli	Fleurs
openai/whisper-small	22.7	16.2	15.7	15.0
openai/whisper-medium	16.0	8.9	12.2	8.7
openai/whisper-large	14.7	8.9	11.0	7.7
openai/whisper-large-v2	13.9	7.3	11.4	8.3

微调模型的字错率（WER）：以下是微调模型在 Common Voice 11.0、Multilingual LibriSpeech、Voxpopuli 和 Fleurs 上的字错率（WER）。请注意，这些评估数据集经过过滤和预处理，仅包含法文字符，并去除了撇号以外的标点符号。表格中的结果以 字错率（贪心搜索）/ 字错率（束宽为 5 的束搜索） 的形式报告。

模型	Common Voice 11.0	MLS	VoxPopuli	Fleurs
bofenghuang/whisper-small-cv11-french	11.76 / 10.99	9.65 / 8.91	14.45 / 13.66	10.76 / 9.83
bofenghuang/whisper-medium-cv11-french	9.03 / 8.54	6.34 / 5.86	11.64 / 11.35	7.13 / 6.85
bofenghuang/whisper-medium-french	9.03 / 8.73	4.60 / 4.44	9.53 / 9.46	6.33 / 5.94
bofenghuang/whisper-large-v2-cv11-french	8.05 / 7.67	5.56 / 5.28	11.50 / 10.69	5.42 / 5.05
bofenghuang/whisper-large-v2-french	8.15 / 7.83	4.20 / 4.03	9.10 / 8.66	5.22 / 4.98