whisper-th-medium-combined開源模型 - 免費用於泰語自動語音識別

首頁

Whisper Th Medium Combined

由biodatlab開發

基於openai/whisper-medium在增強版的泰語數據集上進行微調，用於泰語自動語音識別

語音識別

Transformers

開源協議:Apache-2.0 #泰語語音識別 #低WER轉錄 #多數據集微調

下載量 4,167

發布時間 : 12/14/2022

模型概述

本模型是基於openai/whisper-medium在增強版的mozilla-foundation/common_voice_13_0泰語數據集、google/fleurs數據集以及精選數據集上進行微調得到的泰語自動語音識別模型。

模型特點

高精度泰語識別

在common-voice-13測試集上取得了7.42的字錯率（WER）

多數據集微調

基於mozilla-foundation/common_voice_13_0、google/fleurs及精選數據集進行微調

支持長音頻處理

支持chunk_length_s=30的長音頻分段處理

模型能力

泰語語音識別

長音頻轉錄

使用案例

語音轉錄

泰語語音轉文字

將泰語語音文件轉換為文字

字錯率7.42

🚀 Whisper Medium (泰語)：Combined V3

本模型是基於 openai/whisper-medium 在增強版的 mozilla-foundation/common_voice_13_0 泰語數據集、google/fleurs 數據集以及精選數據集上進行微調得到的。它在 common-voice-13 測試集上取得了以下成績：

字錯率（WER）：7.42（使用 Deepcut 分詞器）

🚀 快速開始

模型描述

可以按照以下方式使用 huggingface 的 transformers 庫來使用該模型：

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # 指定模型名稱
lang = "th"  # 更改為泰語

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # 傳入音頻文件並進行轉錄

💻 使用示例

基礎用法

# 使用示例代碼保持不變
from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

高級用法

暫未提供高級用法示例。

🔧 技術細節

訓練超參數

訓練過程中使用了以下超參數：

學習率（learning_rate）：1e-05
訓練批次大小（train_batch_size）：16
評估批次大小（eval_batch_size）：16
隨機種子（seed）：42
優化器（optimizer）：AdamW，其中 betas=(0.9, 0.999)，epsilon=1e-08
學習率調度器類型（lr_scheduler_type）：線性
學習率調度器熱身步數（lr_scheduler_warmup_steps）：500
訓練步數（training_steps）：10000
混合精度訓練（mixed_precision_training）：原生自動混合精度（Native AMP）

框架版本

Transformers 4.37.2
Pytorch 2.1.0
Datasets 2.16.1
Tokenizers 0.15.1

📄 許可證

本模型採用 Apache-2.0 許可證。

📚 詳細文檔

引用

使用 BibTeX 進行引用：

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

信息表格

屬性	詳情
模型類型	基於微調的 Whisper 模型，用於泰語自動語音識別
訓練數據	mozilla-foundation/common_voice_13_0、google/fleurs 以及精選數據集