whisper-medium-id开源语音识别模型 - 免费部署，大幅提升印尼语识别准确率

首页

Whisper Medium Id

由 cahya 开发

基于openai/whisper-medium在印尼语数据集上微调的语音识别模型，显著提升了印尼语识别准确率

语音识别

Transformers

其他开源协议:Apache-2.0 #印尼语语音识别 #低词错误率 #多数据集微调

下载量 1,961

发布时间 : 12/7/2022

模型简介

该模型是针对印尼语优化的自动语音识别(ASR)模型，在多个印尼语数据集上微调，显著降低了词错误率(WER)

模型特点

印尼语优化

在印尼语数据集上专门微调，相比基础模型显著提升了印尼语识别准确率

多数据集训练

使用mozilla-foundation/common_voice_11_0、magic_data、titml和google/fleurs多个印尼语数据集进行训练

低词错误率

在Common Voice 11测试集上词错误率(WER)仅为3.83，远优于基础模型的12.62

模型能力

印尼语语音识别

自动语音转文本

支持标点符号识别

使用案例

语音转录

印尼语会议记录

将印尼语会议录音自动转录为文本

词错误率低至3.83

语音助手

用于印尼语语音助手应用的语音识别模块

🚀 印尼语版Whisper Medium模型

本模型是基于 openai/whisper-medium 在印尼语的 mozilla-foundation/common_voice_11_0、magic_data、titml 和 google/fleurs 数据集上微调得到的。它取得了以下成果：

🔍 模型信息

属性	详情
模型类型	印尼语版 Whisper Medium 模型
训练数据	mozilla-foundation/common_voice_11_0、magic_data、TITML、google/fleurs
评估指标	WER (词错误率)
基础模型	openai/whisper-medium

🚀 快速开始

本模型在印尼语的 mozilla-foundation/common_voice_11_0、magic_data、titml 和 google/fleurs 数据集上进行了微调，取得了不错的效果。以下是详细的使用说明和评估结果。

✨ 主要特性

在印尼语数据集上微调，对印尼语语音识别有更好的效果。
提供了具体的训练超参数和训练结果，方便参考和复现。
对模型进行了多数据集的评估，展示了模型的性能。

📦 安装指南

暂未提供安装步骤，可参考 Hugging Face 上 transformers 库的安装方法。

💻 使用示例

基础用法

from transformers import pipeline
transcriber = pipeline(
  "automatic-speech-recognition", 
  model="cahya/whisper-medium-id"
)
transcriber.model.config.forced_decoder_ids = (
  transcriber.tokenizer.get_decoder_prompt_ids(
    language="id", 
    task="transcribe"
  )
)
transcription = transcriber("my_audio_file.mp3")

📚 详细文档

预期用途和限制

更多信息待补充。

训练和评估数据

更多信息待补充。

训练过程

训练超参数

训练过程中使用了以下超参数：

学习率（learning_rate）: 1e-06
训练批次大小（train_batch_size）: 16
评估批次大小（eval_batch_size）: 16
随机种子（seed）: 42
优化器（optimizer）: Adam，β=(0.9, 0.999)，ε=1e-08
学习率调度器类型（lr_scheduler_type）: 线性
学习率调度器热身步数（lr_scheduler_warmup_steps）: 500
训练步数（training_steps）: 10000
混合精度训练（mixed_precision_training）: 原生 AMP

训练结果

训练损失	轮数	步数	验证损失	词错误率（Wer）
0.0427	0.33	1000	0.0664	4.3807
0.042	0.66	2000	0.0658	3.9426
0.0265	0.99	3000	0.0657	3.8274
0.0211	1.32	4000	0.0679	3.8366
0.0212	1.66	5000	0.0682	3.8412
0.0206	1.99	6000	0.0683	3.8689
0.0166	2.32	7000	0.0711	3.9657
0.0095	2.65	8000	0.0717	3.9980
0.0122	2.98	9000	0.0714	3.9795
0.0049	3.31	10000	0.0720	3.9887

评估

我们使用 Common Voice 11 和 Google Fleurs 两个数据集的测试集对模型进行了评估。由于 Whisper 可以识别大小写和标点符号，我们还使用原始文本和归一化文本（小写 + 去除标点）对其性能进行了评估。结果如下：

Common Voice 11

模型	词错误率（WER）
cahya/whisper-medium-id	3.83
openai/whisper-medium	12.62

Google/Fleurs

模型	词错误率（WER）
cahya/whisper-medium-id	9.74
cahya/whisper-medium-id + 文本归一化	待确定
openai/whisper-medium	10.2
openai/whisper-medium + 文本归一化	待确定