asr-whisper-large-v2-commonvoice-fa開源語音識別模型

首頁

Asr Whisper Large V2 Commonvoice Fa

由speechbrain開發

這是一個基於whisper-large-v2架構的自動語音識別模型，專門針對波斯語在CommonVoice數據集上進行了微調。

語音識別

PyTorch

其他開源協議:Apache-2.0 #波斯語語音識別 #whisper大模型 #低詞錯誤率

下載量 103

發布時間 : 1/30/2023

模型概述

該模型用於波斯語的自動語音識別任務，採用whisper編碼器-解碼器架構，在CommonVoice波斯語數據集上微調獲得。

模型特點

高性能波斯語識別

在CommonVoice波斯語測試集上達到31.75%的詞錯誤率(WER)和9.38%的字符錯誤率(CER)

基於預訓練模型

使用預訓練的whisper-large-v2模型作為基礎，編碼器部分保持凍結

端到端訓練

整個系統採用端到端方式訓練，簡化了語音識別流程

模型能力

波斯語語音識別

16kHz音頻處理

自動音頻標準化

使用案例

語音轉寫

波斯語語音轉錄

將波斯語語音內容轉換為文本

在測試集上達到31.75%的詞錯誤率

🚀 基於CommonVoice波斯語微調的Whisper Large-V2模型

本項目提供了在SpeechBrain框架下，基於CommonVoice（波斯語）數據集微調的端到端Whisper自動語音識別模型所需的全部工具。為獲得更好的使用體驗，建議您進一步瞭解 SpeechBrain。

模型信息

屬性	詳情
模型類型	基於Whisper Large-V2在CommonVoice波斯語數據集上微調的自動語音識別模型
訓練數據	CommonVoice 10.0（波斯語）
評估指標	詞錯誤率（WER）、字符錯誤率（CER）
許可證	Apache-2.0

模型性能

發佈日期	測試字符錯誤率（CER）	測試詞錯誤率（WER）	所用GPU
01-02-23	9.38	31.75	1xV100 16GB

🚀 快速開始

📦 安裝SpeechBrain

首先，請使用以下命令安裝transformers和SpeechBrain：

pip install speechbrain transformers==4.28.0

建議您閱讀相關教程，進一步瞭解 SpeechBrain。

💻 使用示例

基礎用法

以下代碼展示瞭如何使用微調後的模型對波斯語音頻文件進行轉錄：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-fa", savedir="pretrained_models/asr-whisper-large-v2-commonvoice-fa")
asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-fa/example-fa.wav")

高級用法

若要在GPU上進行推理，請在調用from_hparams方法時添加 run_opts={"device":"cuda"}：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-fa", savedir="pretrained_models/asr-whisper-large-v2-commonvoice-fa", run_opts={"device":"cuda"})
asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-fa/example-fa.wav")

🔧 訓練模型

該模型使用SpeechBrain進行訓練。若要從頭開始訓練模型，請按照以下步驟操作：

克隆SpeechBrain倉庫：

git clone https://github.com/speechbrain/speechbrain/

安裝依賴：

cd speechbrain
pip install -r requirements.txt
pip install -e .

運行訓練腳本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_fa_hf_whisper.yaml --data_folder=your_data_folder

您可以在此處找到訓練結果（模型、日誌等）。

📚 詳細文檔

管道描述

該自動語音識別（ASR）系統由Whisper的編碼器 - 解碼器模塊組成：

預訓練的whisper-large-v2編碼器被凍結。
使用預訓練的Whisper分詞器。
預訓練的Whisper-large-v2解碼器（openai/whisper-large-v2）在CommonVoice波斯語數據集上進行微調。最終得到的聲學表示將輸入到貪心解碼器中。

系統使用採樣率為16kHz（單聲道）的錄音進行訓練。調用transcribe_file時，代碼會自動對音頻進行歸一化處理（即重採樣和單聲道選擇）。

侷限性

SpeechBrain團隊不保證該模型在其他數據集上的性能。

引用SpeechBrain

如果您使用了本項目，請引用以下文獻：

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
  }