Whisper-Large-V3-法语蒸馏版开源模型 - 降低资源消耗高效法语语音识别

首页

Whisper Large V3 French Distil Dec16

由 bofenghuang 开发

Whisper-Large-V3-法语蒸馏版是通过将解码器层数从32层缩减至16层，并基于大规模数据集进行蒸馏训练而得到的法语语音识别模型。该模型在保持性能的同时显著降低了内存占用和推理时间。

语音识别

Transformers

法语开源协议:MIT #法语语音识别 #蒸馏加速 #长文本优化

下载量 2,461

发布时间 : 12/13/2023

模型简介

这是一个针对法语优化的语音识别模型，通过蒸馏技术减少了模型复杂度，适合需要高效语音转录的应用场景。

模型特点

高效蒸馏架构

解码器层数从32层缩减至16层，显著降低计算资源需求

性能保持

在保持接近原始模型准确率的同时提升推理速度

长文本处理优化

有效缓解了长文本转录中的幻觉风险

多框架支持

支持transformers、openai-whisper、fasterwhisper等多种推理框架

模型能力

法语语音识别

长音频转录

实时语音转文字

使用案例

客服场景

客服通话记录转录

将法语客服通话内容自动转录为文字

在包含背景噪声和领域术语的测试集上表现良好

媒体处理

法语视频字幕生成

自动为法语视频内容生成字幕

🚀 Whisper-Large-V3-French-Distil-Dec16

Whisper-Large-V3-French-Distil是Whisper-Large-V3-French的一系列蒸馏版本。通过将解码器层数从32层减少到16层、8层、4层或2层，并使用大规模数据集进行蒸馏，具体可参考这篇论文。

这些蒸馏变体在保持性能（基于保留的层数）的同时，减少了内存使用和推理时间，并降低了幻觉风险，特别是在长文本转录中。此外，它们可以与原始的Whisper-Large-V3-French模型无缝结合进行推测解码，与单独使用该模型相比，可提高推理速度并保证输出的一致性。

该模型已转换为多种格式，便于在不同的库中使用，包括transformers、openai-whisper、fasterwhisper、whisper.cpp、candle、mlx等。

🚀 快速开始

本模型可用于法语语音识别任务，能在多种库中使用，下面将详细介绍其使用方法。

✨ 主要特性

蒸馏优化：减少解码器层数，降低内存使用和推理时间，同时保持性能。
减少幻觉：降低长文本转录中的幻觉风险。
推测解码：可与原始模型结合，提高推理速度。
多格式支持：支持多种库，方便在不同环境中使用。

📦 安装指南

根据不同的使用场景，需要安装不同的依赖库，以下是一些常见的安装命令：

OpenAI Whisper

pip install -U openai-whisper

Faster Whisper

pip install faster-whisper

Whisper.cpp

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

💻 使用示例

基础用法

Hugging Face Pipeline

import torch
from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# 加载模型
model_name_or_path = "bofenghuang/whisper-large-v3-french-distil-dec16"
processor = AutoProcessor.from_pretrained(model_name_or_path)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_name_or_path,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
)
model.to(device)

# 初始化pipeline
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    feature_extractor=processor.feature_extractor,
    tokenizer=processor.tokenizer,
    torch_dtype=torch_dtype,
    device=device,
    # chunk_length_s=30,  # 用于长文本转录
    max_new_tokens=128,
)

# 示例音频
dataset = load_dataset("bofenghuang/asr-dummy", "fr", split="test")
sample = dataset[0]["audio"]

# 运行pipeline
result = pipe(sample)
print(result["text"])

高级用法

推测解码

import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoModelForSpeechSeq2Seq,
    AutoProcessor,
    pipeline,
)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# 加载模型
model_name_or_path = "bofenghuang/whisper-large-v3-french"
processor = AutoProcessor.from_pretrained(model_name_or_path)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_name_or_path,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
)
model.to(device)

# 加载草稿模型
assistant_model_name_or_path = "bofenghuang/whisper-large-v3-french-distil-dec2"
assistant_model = AutoModelForCausalLM.from_pretrained(
    assistant_model_name_or_path,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
)
assistant_model.to(device)

# 初始化pipeline
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    feature_extractor=processor.feature_extractor,
    tokenizer=processor.tokenizer,
    torch_dtype=torch_dtype,
    device=device,
    generate_kwargs={"assistant_model": assistant_model},
    max_new_tokens=128,
)

# 示例音频
dataset = load_dataset("bofenghuang/asr-dummy", "fr", split="test")
sample = dataset[0]["audio"]

# 运行pipeline
result = pipe(sample)
print(result["text"])