asr-whisper-medium-commonvoice-ar開源語音識別模型

首頁

Asr Whisper Medium Commonvoice Ar

由speechbrain開發

基於CommonVoice阿拉伯語數據集微調的Whisper medium語音識別模型，由SpeechBrain團隊開發

語音識別

PyTorch

阿拉伯語開源協議:Apache-2.0 #阿拉伯語語音識別 #低WER #CommonVoice微調

下載量 17

發布時間 : 7/20/2023

模型概述

該模型是基於Whisper medium架構的自動語音識別系統，專門針對阿拉伯語進行了優化，在CommonVoice阿拉伯語數據集上微調

模型特點

高精度阿拉伯語識別

在CommonVoice阿拉伯語測試集上達到14.82%的WER

基於Whisper架構

利用OpenAI Whisper medium預訓練模型進行微調

端到端訓練

完整的編碼器-解碼器架構，直接輸出文本結果

自動音頻處理

內置音頻歸一化功能（重採樣+單聲道選擇）

模型能力

阿拉伯語語音識別

音頻轉錄

16kHz單聲道音頻處理

使用案例

語音轉錄

阿拉伯語語音轉文字

將阿拉伯語語音內容轉換為文本

測試集WER 14.82%，CER 4.95%

語音助手

阿拉伯語語音指令識別

用於阿拉伯語語音助手的前端語音識別模塊

🚀 基於CommonVoice-14.0阿拉伯語微調的Whisper Medium模型

本倉庫提供了使用在CommonVoice（阿拉伯語）數據集上微調的端到端Whisper模型，在SpeechBrain中執行自動語音識別所需的所有工具。為獲得更好的體驗，建議您進一步瞭解 SpeechBrain。

模型的性能如下：

發佈版本	測試字符錯誤率（CER）	測試詞錯誤率（WER）	GPU 配置
23年8月1日	4.95	14.82	1xV100 32GB

✨ 主要特性

本自動語音識別（ASR）系統由Whisper編碼器 - 解碼器模塊組成。
預訓練的Whisper-medium編碼器被凍結。
使用預訓練的Whisper分詞器。
在CommonVoice阿拉伯語數據集上微調預訓練的Whisper-medium解碼器（openai/whisper-medium）。
最終得到的聲學表示將輸入到貪心解碼器中。
系統使用採樣率為16kHz（單聲道）的錄音進行訓練。代碼在調用 transcribe_file 時會自動對音頻進行歸一化處理（即重採樣和單聲道選擇）。

📦 安裝指南

首先，請使用以下命令安裝 transformers 和 SpeechBrain：

pip install speechbrain transformers

建議您閱讀相關教程，進一步瞭解 SpeechBrain。

💻 使用示例

基礎用法

對您自己的阿拉伯語音頻文件進行轉錄：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-ar", savedir="pretrained_models/asr-whisper-medium-commonvoice-ar")
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-ar/example-ar.mp3")

高級用法

在GPU上進行推理：在調用 from_hparams 方法時添加 run_opts={"device":"cuda"}。

🔧 技術細節

訓練步驟

該模型使用SpeechBrain進行訓練。若要從頭開始訓練，請按以下步驟操作：

克隆SpeechBrain倉庫：

git clone https://github.com/speechbrain/speechbrain/

安裝依賴：

cd speechbrain
pip install -r requirements.txt
pip install -e .

運行訓練腳本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_ar_hf_whisper.yaml --data_folder=your_data_folder

您可以在此處找到訓練結果（模型、日誌等）。

侷限性

SpeechBrain團隊不保證該模型在其他數據集上的性能。

引用SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }

關於SpeechBrain

SpeechBrain是一個開源的一體化語音工具包，設計簡單、極其靈活且用戶友好，在多個領域都能取得有競爭力或領先的性能。

官網：https://speechbrain.github.io/
GitHub：https://github.com/speechbrain/speechbrain

📄 許可證

本項目採用 apache-2.0 許可證。

屬性	詳情
模型類型	基於Whisper的自動語音識別模型
訓練數據	CommonVoice阿拉伯語數據集
評估指標	詞錯誤率（WER）、字符錯誤率（CER）