asr-whisper-medium-commonvoice-fa開源模型 - 免費部署助力波斯語自動語音識別

首頁

Asr Whisper Medium Commonvoice Fa

由speechbrain開發

基於CommonVoice-14.0波斯語數據集微調的whisper medium模型，用於波斯語自動語音識別任務。

語音識別

PyTorch

其他開源協議:Apache-2.0 #波斯語語音識別 #Whisper微調 #低詞錯誤率

下載量 21

發布時間 : 7/20/2023

模型概述

該模型是基於whisper-medium架構的自動語音識別系統，專門針對波斯語進行了優化，能夠將波斯語音頻轉換為文本。

模型特點

預訓練模型微調

基於預訓練的whisper-medium模型在波斯語數據上進行微調，保留了原模型的強大特徵提取能力

高效訓練

凍結了預訓練的whisper編碼器，只微調解碼器部分，提高了訓練效率

自動音頻處理

內置音頻標準化處理，包括自動重採樣和單聲道選擇

模型能力

波斯語語音識別

音頻轉錄

語音轉文本

使用案例

語音轉錄

波斯語語音轉文本

將波斯語音頻文件轉換為文本格式

在CommonVoice測試集上達到35.48%的詞錯誤率

語音助手

波斯語語音命令識別

用於構建波斯語語音助手的基礎識別模塊

🚀 基於CommonVoice-14.0波斯語微調的Whisper Medium模型

本倉庫提供了所有必要的工具，可用於在SpeechBrain中基於端到端的Whisper模型進行自動語音識別，該模型已在CommonVoice（波斯語）上進行了微調。為獲得更好的體驗，建議您進一步瞭解 SpeechBrain。

模型的性能表現如下：

版本發佈	測試字符錯誤率（CER）	測試詞錯誤率（WER）	GPU 配置
2023年8月1日	11.27	35.48	1xV100 32GB

🚀 快速開始

本倉庫提供了在SpeechBrain中使用基於CommonVoice（波斯語）微調的端到端Whisper模型進行自動語音識別的工具。

✨ 主要特性

基於微調的Whisper模型進行自動語音識別。
提供了詳細的安裝、使用和訓練步驟。
給出了模型在測試集上的性能指標。

📦 安裝指南

首先，請使用以下命令安裝 transformers 和 SpeechBrain：

pip install speechbrain transformers

建議您閱讀我們的教程，進一步瞭解 SpeechBrain。

💻 使用示例

基礎用法

以下是轉錄您自己的波斯語音頻文件的示例代碼：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-fa", savedir="pretrained_models/asr-whisper-medium-commonvoice-fa")
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-fa/example-fa.mp3")

高級用法

若要在GPU上進行推理，請在調用 from_hparams 方法時添加 run_opts={"device":"cuda"}：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-fa", savedir="pretrained_models/asr-whisper-medium-commonvoice-fa", run_opts={"device":"cuda"})
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-fa/example-fa.mp3")

📚 詳細文檔

管道描述

此自動語音識別（ASR）系統由Whisper編碼器 - 解碼器模塊組成：

預訓練的Whisper-medium編碼器被凍結。
使用預訓練的Whisper分詞器。
預訓練的Whisper-medium解碼器（openai/whisper-medium）在CommonVoice波斯語數據集上進行微調。最終得到的聲學表示將被輸入到貪心解碼器中。

該系統使用採樣率為16kHz（單聲道）的錄音進行訓練。調用 transcribe_file 時，代碼會自動對音頻進行歸一化處理（即重採樣和單聲道選擇）。

訓練步驟

若要從頭開始訓練該模型，請按照以下步驟操作：

克隆SpeechBrain倉庫：

git clone https://github.com/speechbrain/speechbrain/

安裝依賴：

cd speechbrain
pip install -r requirements.txt
pip install -e .

運行訓練腳本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_fa_hf_whisper.yaml --data_folder=your_data_folder

您可以在此處找到我們的訓練結果（模型、日誌等）。

侷限性

SpeechBrain團隊不保證該模型在其他數據集上的性能表現。

🔧 技術細節

屬性	詳情
模型類型	基於Whisper的自動語音識別模型
訓練數據	CommonVoice 10.0（波斯語）
評估指標	詞錯誤率（WER）、字符錯誤率（CER）

📄 許可證

本項目採用Apache 2.0許可證。

引用SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
  }