whisper-small-spanish开源语音识别模型 - 免费部署精准完成西班牙语转录

首页

Whisper Small Spanish

由 clu-ling 开发

该模型是基于OpenAI的whisper-small在Common Voice数据集v11西班牙语版本上微调的语音识别模型，专注于西班牙语转录任务。

语音识别

Transformers

开源协议:Apache-2.0 #西班牙语语音识别 #低词错误率 #CommonVoice微调

下载量 298

发布时间 : 12/14/2022

模型简介

whisper-small-spanish是针对西班牙语优化的自动语音识别(ASR)模型，能够将西班牙语语音准确转录为文本。

模型特点

西班牙语优化

专门针对西班牙语语音进行微调，相比原始whisper-small模型在西班牙语识别上有更好表现

低词错误率

在Common Voice测试集上达到20.68%的词错误率(WER)

高效训练

使用混合精度训练和线性学习率调度器优化训练过程

模型能力

西班牙语语音识别

语音转文本

长音频处理

使用案例

语音转录

西班牙语会议记录

将西班牙语会议录音自动转录为文字记录

准确率约80%

语音助手

为西班牙语语音助手提供语音识别能力

教育

语言学习辅助

帮助西班牙语学习者检查发音准确性

🚀 whisper-small-sp

本模型是基于commonvoice dataset v11数据集对openai/whisper-small进行微调后的版本。它在评估集上取得了以下结果：

损失值：0.4485
词错误率（Wer）：20.6842

🚀 快速开始

本模型可用于语音转录任务，以下是使用示例。

✨ 主要特性

基于微调的openai/whisper-small模型，在特定数据集上进行了优化。
提供了训练超参数和训练结果的详细信息。
包含转录和评估的代码示例。

📦 安装指南

文档未提及安装步骤，暂不展示。

💻 使用示例

基础用法

from datasets import load_dataset, Audio
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# load the model
processor = WhisperProcessor.from_pretrained("clu-ling/whisper-small-spanish")
model = WhisperForConditionalGeneration.from_pretrained("clu-ling/whisper-small-spanish").to(device)
forced_decoder_ids = processor.get_decoder_prompt_ids(language="es", task="transcribe")

# load the dataset
commonvoice_eval = load_dataset("mozilla-foundation/common_voice_11_0", "es", split="validation", streaming=True)
commonvoice_eval = commonvoice_eval.cast_column("audio", Audio(sampling_rate=16000))
sample = next(iter(commonvoice_eval))["audio"]

# features and generate token ids
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to(device), forced_decoder_ids=forced_decoder_ids)

# decode
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription)

高级用法

from transformers.models.whisper.english_normalizer import BasicTextNormalizer
from datasets import load_dataset, Audio
import evaluate
import torch
import re
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# metric
wer_metric = evaluate.load("wer")

# model
processor = WhisperProcessor.from_pretrained("clu-ling/whisper-small-spanish")
model = WhisperForConditionalGeneration.from_pretrained("clu-ling/whisper-small-spanish")

# dataset
dataset = load_dataset("mozilla-foundation/common_voice_11_0", "es", split="test", )#cache_dir=args.cache_dir
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))

#for debuggings: it gets some examples
#dataset = dataset.shard(num_shards=10000, index=0)
#print(dataset)
   
def normalize(batch):
  batch["gold_text"] = whisper_norm(batch['sentence'])
  return batch

def map_wer(batch):
  model.to(device)
  forced_decoder_ids = processor.get_decoder_prompt_ids(language = "es", task = "transcribe")
  inputs = processor(batch["audio"]["array"], sampling_rate=batch["audio"]["sampling_rate"], return_tensors="pt").input_features
  with torch.no_grad():
    generated_ids = model.generate(inputs=inputs.to(device), forced_decoder_ids=forced_decoder_ids)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
  batch["predicted_text"] = whisper_norm(transcription)
  return batch

# process GOLD text
processed_dataset = dataset.map(normalize)
# get predictions
predicted = processed_dataset.map(map_wer)

# word error rate
wer = wer_metric.compute(references=predicted['gold_text'], predictions=predicted['predicted_text'])
wer = round(100 * wer, 2)
print("WER:", wer)

🔧 技术细节

训练超参数

训练过程中使用了以下超参数：

学习率：0.0005
训练批次大小：16
评估批次大小：8
随机种子：42
优化器：Adam（β1 = 0.9，β2 = 0.999，ε = 1e-08）
学习率调度器类型：线性
学习率调度器热身步数：500
训练步数：25000
混合精度训练：原生自动混合精度（Native AMP）

训练结果

训练损失	轮数	步数	验证损失	词错误率（Wer）
2.2671	0.13	1000	2.2108	76.2667
1.4465	0.26	2000	1.6057	67.8753
1.0997	0.39	3000	1.1928	54.2433
0.9389	0.52	4000	1.0020	47.8307
0.7881	0.65	5000	0.8933	46.0046
0.7596	0.78	6000	0.7721	38.5595
0.5678	0.91	7000	0.6903	36.2897
0.4412	1.04	8000	0.6476	32.7473
0.4239	1.17	9000	0.5973	30.8142
0.3935	1.3	10000	0.5444	29.0208
0.3307	1.43	11000	0.5024	27.0434
0.2937	1.56	12000	0.4608	24.7318
0.2471	1.69	13000	0.4259	22.8940
0.2357	1.82	14000	0.3936	21.6018
0.2292	1.95	15000	0.3776	20.8004
0.1493	2.08	16000	0.4599	24.0491
0.1708	2.21	17000	0.4370	23.3443
0.1385	2.34	18000	0.4277	22.3171
0.1288	2.47	19000	0.4050	21.0118
0.1627	2.6	20000	0.4507	23.4004
0.1675	2.73	21000	0.4346	22.8261
0.159	2.86	22000	0.4179	22.2949
0.1458	2.99	23000	0.3978	21.0810
0.0487	3.12	24000	0.4456	20.8617
0.0401	3.25	25000	0.4485	20.6842