whisper-th-large-v3-combined开源泰语语音识别模型 - 低错误率精准识别泰语语音

首页

Whisper Th Large V3 Combined

由 biodatlab 开发

这是一个基于 OpenAI 的 Whisper Large V3 模型微调的泰语自动语音识别模型，在 Common Voice 13 泰语测试集上取得了 6.59% 的词错误率。

语音识别

Transformers

开源协议:Apache-2.0 #泰语语音识别 #低词错误率 #多数据集微调

下载量 1,354

发布时间 : 2/20/2024

模型简介

该模型是针对泰语优化的自动语音识别(ASR)模型，在增强版的 Common Voice 13 和 FLEURS 数据集上进行微调，专门用于泰语语音转录任务。

模型特点

低词错误率

在 Common Voice 13 泰语测试集上仅 6.59% 的词错误率(WER)

泰语优化

专门针对泰语语音特性进行微调

混合数据集训练

使用 Common Voice 13 和 FLEURS 等多个数据集增强训练

模型能力

泰语语音识别

音频转录

长音频处理(支持30秒分块)

使用案例

语音转录

泰语会议记录

将泰语会议录音自动转录为文字

高准确率的转录文本

泰语媒体字幕生成

为泰语视频内容自动生成字幕

🚀 Whisper Large V3（泰语）：组合版本V1

本模型是基于 openai/whisper-medium 在增强版的 mozilla-foundation/common_voice_13_0 泰语数据集、google/fleurs 数据集以及精心挑选的数据集上进行微调得到的。它在 common-voice-13 测试集上取得了以下成绩：

字错率（WER）：6.59（使用 Deepcut 分词器）

🚀 快速开始

使用 Hugging Face 的 transformers 库调用该模型的示例代码如下：

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-large-v3-combined"  # 指定模型名称
lang = "th"  # 切换为泰语

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # 传入音频文件并进行转录

💻 使用示例

基础用法

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-large-v3-combined"  # 指定模型名称
lang = "th"  # 切换为泰语

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # 传入音频文件并进行转录

📚 详细文档

模型描述

该模型是 openai/whisper-medium 的微调版本，在增强版的 mozilla-foundation/common_voice_13_0 泰语数据集、google/fleurs 数据集以及精心挑选的数据集上进行训练。在 common-voice-13 测试集上，其字错率（WER）为 6.59（使用 Deepcut 分词器）。

预期用途与限制

更多信息待补充。

训练和评估数据

更多信息待补充。

训练过程

训练超参数

训练过程中使用了以下超参数：

属性	详情
学习率	1e-05
训练批次大小	16
评估批次大小	16
随机种子	42
优化器	AdamW（β1 = 0.9，β2 = 0.999，ε = 1e-08）
学习率调度器类型	线性
学习率调度器热身步数	500
训练步数	10000
混合精度训练	原生自动混合精度（Native AMP）

框架版本

Transformers 4.37.2
Pytorch 2.1.0
Datasets 2.16.1
Tokenizers 0.15.1

引用

使用 BibTeX 引用该模型：

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}