asr-whisper-medium-commonvoice-ar开源语音识别模型

首页

Asr Whisper Medium Commonvoice Ar

由 speechbrain 开发

基于CommonVoice阿拉伯语数据集微调的Whisper medium语音识别模型，由SpeechBrain团队开发

语音识别

PyTorch

阿拉伯语开源协议:Apache-2.0 #阿拉伯语语音识别 #低WER #CommonVoice微调

下载量 17

发布时间 : 7/20/2023

模型简介

该模型是基于Whisper medium架构的自动语音识别系统，专门针对阿拉伯语进行了优化，在CommonVoice阿拉伯语数据集上微调

模型特点

高精度阿拉伯语识别

在CommonVoice阿拉伯语测试集上达到14.82%的WER

基于Whisper架构

利用OpenAI Whisper medium预训练模型进行微调

端到端训练

完整的编码器-解码器架构，直接输出文本结果

自动音频处理

内置音频归一化功能（重采样+单声道选择）

模型能力

阿拉伯语语音识别

音频转录

16kHz单声道音频处理

使用案例

语音转录

阿拉伯语语音转文字

将阿拉伯语语音内容转换为文本

测试集WER 14.82%，CER 4.95%

语音助手

阿拉伯语语音指令识别

用于阿拉伯语语音助手的前端语音识别模块

🚀 基于CommonVoice-14.0阿拉伯语微调的Whisper Medium模型

本仓库提供了使用在CommonVoice（阿拉伯语）数据集上微调的端到端Whisper模型，在SpeechBrain中执行自动语音识别所需的所有工具。为获得更好的体验，建议您进一步了解 SpeechBrain。

模型的性能如下：

发布版本	测试字符错误率（CER）	测试词错误率（WER）	GPU 配置
23年8月1日	4.95	14.82	1xV100 32GB

✨ 主要特性

本自动语音识别（ASR）系统由Whisper编码器 - 解码器模块组成。
预训练的Whisper-medium编码器被冻结。
使用预训练的Whisper分词器。
在CommonVoice阿拉伯语数据集上微调预训练的Whisper-medium解码器（openai/whisper-medium）。
最终得到的声学表示将输入到贪心解码器中。
系统使用采样率为16kHz（单声道）的录音进行训练。代码在调用 transcribe_file 时会自动对音频进行归一化处理（即重采样和单声道选择）。

📦 安装指南

首先，请使用以下命令安装 transformers 和 SpeechBrain：

pip install speechbrain transformers

建议您阅读相关教程，进一步了解 SpeechBrain。

💻 使用示例

基础用法

对您自己的阿拉伯语音频文件进行转录：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-ar", savedir="pretrained_models/asr-whisper-medium-commonvoice-ar")
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-ar/example-ar.mp3")

高级用法

在GPU上进行推理：在调用 from_hparams 方法时添加 run_opts={"device":"cuda"}。

🔧 技术细节

训练步骤

该模型使用SpeechBrain进行训练。若要从头开始训练，请按以下步骤操作：

克隆SpeechBrain仓库：

git clone https://github.com/speechbrain/speechbrain/

安装依赖：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练脚本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_ar_hf_whisper.yaml --data_folder=your_data_folder

您可以在此处找到训练结果（模型、日志等）。

局限性

SpeechBrain团队不保证该模型在其他数据集上的性能。

引用SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }

关于SpeechBrain

SpeechBrain是一个开源的一体化语音工具包，设计简单、极其灵活且用户友好，在多个领域都能取得有竞争力或领先的性能。

官网：https://speechbrain.github.io/
GitHub：https://github.com/speechbrain/speechbrain

📄 许可证

本项目采用 apache-2.0 许可证。

属性	详情
模型类型	基于Whisper的自动语音识别模型
训练数据	CommonVoice阿拉伯语数据集
评估指标	词错误率（WER）、字符错误率（CER）