asr-whisper-medium-commonvoice-fa开源模型 - 免费部署助力波斯语自动语音识别

首页

Asr Whisper Medium Commonvoice Fa

由 speechbrain 开发

基于CommonVoice-14.0波斯语数据集微调的whisper medium模型，用于波斯语自动语音识别任务。

语音识别

PyTorch

其他开源协议:Apache-2.0 #波斯语语音识别 #Whisper微调 #低词错误率

下载量 21

发布时间 : 7/20/2023

模型简介

该模型是基于whisper-medium架构的自动语音识别系统，专门针对波斯语进行了优化，能够将波斯语音频转换为文本。

模型特点

预训练模型微调

基于预训练的whisper-medium模型在波斯语数据上进行微调，保留了原模型的强大特征提取能力

高效训练

冻结了预训练的whisper编码器，只微调解码器部分，提高了训练效率

自动音频处理

内置音频标准化处理，包括自动重采样和单声道选择

模型能力

波斯语语音识别

音频转录

语音转文本

使用案例

语音转录

波斯语语音转文本

将波斯语音频文件转换为文本格式

在CommonVoice测试集上达到35.48%的词错误率

语音助手

波斯语语音命令识别

用于构建波斯语语音助手的基础识别模块

🚀 基于CommonVoice-14.0波斯语微调的Whisper Medium模型

本仓库提供了所有必要的工具，可用于在SpeechBrain中基于端到端的Whisper模型进行自动语音识别，该模型已在CommonVoice（波斯语）上进行了微调。为获得更好的体验，建议您进一步了解 SpeechBrain。

模型的性能表现如下：

版本发布	测试字符错误率（CER）	测试词错误率（WER）	GPU 配置
2023年8月1日	11.27	35.48	1xV100 32GB

🚀 快速开始

本仓库提供了在SpeechBrain中使用基于CommonVoice（波斯语）微调的端到端Whisper模型进行自动语音识别的工具。

✨ 主要特性

基于微调的Whisper模型进行自动语音识别。
提供了详细的安装、使用和训练步骤。
给出了模型在测试集上的性能指标。

📦 安装指南

首先，请使用以下命令安装 transformers 和 SpeechBrain：

pip install speechbrain transformers

建议您阅读我们的教程，进一步了解 SpeechBrain。

💻 使用示例

基础用法

以下是转录您自己的波斯语音频文件的示例代码：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-fa", savedir="pretrained_models/asr-whisper-medium-commonvoice-fa")
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-fa/example-fa.mp3")

高级用法

若要在GPU上进行推理，请在调用 from_hparams 方法时添加 run_opts={"device":"cuda"}：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-medium-commonvoice-fa", savedir="pretrained_models/asr-whisper-medium-commonvoice-fa", run_opts={"device":"cuda"})
asr_model.transcribe_file("speechbrain/asr-whisper-medium-commonvoice-fa/example-fa.mp3")

📚 详细文档

管道描述

此自动语音识别（ASR）系统由Whisper编码器 - 解码器模块组成：

预训练的Whisper-medium编码器被冻结。
使用预训练的Whisper分词器。
预训练的Whisper-medium解码器（openai/whisper-medium）在CommonVoice波斯语数据集上进行微调。最终得到的声学表示将被输入到贪心解码器中。

该系统使用采样率为16kHz（单声道）的录音进行训练。调用 transcribe_file 时，代码会自动对音频进行归一化处理（即重采样和单声道选择）。

训练步骤

若要从头开始训练该模型，请按照以下步骤操作：

克隆SpeechBrain仓库：

git clone https://github.com/speechbrain/speechbrain/

安装依赖：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练脚本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_fa_hf_whisper.yaml --data_folder=your_data_folder

您可以在此处找到我们的训练结果（模型、日志等）。

局限性

SpeechBrain团队不保证该模型在其他数据集上的性能表现。

🔧 技术细节

属性	详情
模型类型	基于Whisper的自动语音识别模型
训练数据	CommonVoice 10.0（波斯语）
评估指标	词错误率（WER）、字符错误率（CER）

📄 许可证

本项目采用Apache 2.0许可证。

引用SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
  }