asr-whisper-large-v2-commonvoice-fa开源语音识别模型

首页

Asr Whisper Large V2 Commonvoice Fa

由 speechbrain 开发

这是一个基于whisper-large-v2架构的自动语音识别模型，专门针对波斯语在CommonVoice数据集上进行了微调。

语音识别

PyTorch

其他开源协议:Apache-2.0 #波斯语语音识别 #whisper大模型 #低词错误率

下载量 103

发布时间 : 1/30/2023

模型简介

该模型用于波斯语的自动语音识别任务，采用whisper编码器-解码器架构，在CommonVoice波斯语数据集上微调获得。

模型特点

高性能波斯语识别

在CommonVoice波斯语测试集上达到31.75%的词错误率(WER)和9.38%的字符错误率(CER)

基于预训练模型

使用预训练的whisper-large-v2模型作为基础，编码器部分保持冻结

端到端训练

整个系统采用端到端方式训练，简化了语音识别流程

模型能力

波斯语语音识别

16kHz音频处理

自动音频标准化

使用案例

语音转写

波斯语语音转录

将波斯语语音内容转换为文本

在测试集上达到31.75%的词错误率

🚀 基于CommonVoice波斯语微调的Whisper Large-V2模型

本项目提供了在SpeechBrain框架下，基于CommonVoice（波斯语）数据集微调的端到端Whisper自动语音识别模型所需的全部工具。为获得更好的使用体验，建议您进一步了解 SpeechBrain。

模型信息

属性	详情
模型类型	基于Whisper Large-V2在CommonVoice波斯语数据集上微调的自动语音识别模型
训练数据	CommonVoice 10.0（波斯语）
评估指标	词错误率（WER）、字符错误率（CER）
许可证	Apache-2.0

模型性能

发布日期	测试字符错误率（CER）	测试词错误率（WER）	所用GPU
01-02-23	9.38	31.75	1xV100 16GB

🚀 快速开始

📦 安装SpeechBrain

首先，请使用以下命令安装transformers和SpeechBrain：

pip install speechbrain transformers==4.28.0

建议您阅读相关教程，进一步了解 SpeechBrain。

💻 使用示例

基础用法

以下代码展示了如何使用微调后的模型对波斯语音频文件进行转录：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-fa", savedir="pretrained_models/asr-whisper-large-v2-commonvoice-fa")
asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-fa/example-fa.wav")

高级用法

若要在GPU上进行推理，请在调用from_hparams方法时添加 run_opts={"device":"cuda"}：

from speechbrain.inference.ASR import WhisperASR

asr_model = WhisperASR.from_hparams(source="speechbrain/asr-whisper-large-v2-commonvoice-fa", savedir="pretrained_models/asr-whisper-large-v2-commonvoice-fa", run_opts={"device":"cuda"})
asr_model.transcribe_file("speechbrain/asr-whisper-large-v2-commonvoice-fa/example-fa.wav")

🔧 训练模型

该模型使用SpeechBrain进行训练。若要从头开始训练模型，请按照以下步骤操作：

克隆SpeechBrain仓库：

git clone https://github.com/speechbrain/speechbrain/

安装依赖：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练脚本：

cd recipes/CommonVoice/ASR/transformer/
python train_with_whisper.py hparams/train_fa_hf_whisper.yaml --data_folder=your_data_folder

您可以在此处找到训练结果（模型、日志等）。

📚 详细文档

管道描述

该自动语音识别（ASR）系统由Whisper的编码器 - 解码器模块组成：

预训练的whisper-large-v2编码器被冻结。
使用预训练的Whisper分词器。
预训练的Whisper-large-v2解码器（openai/whisper-large-v2）在CommonVoice波斯语数据集上进行微调。最终得到的声学表示将输入到贪心解码器中。

系统使用采样率为16kHz（单声道）的录音进行训练。调用transcribe_file时，代码会自动对音频进行归一化处理（即重采样和单声道选择）。

局限性

SpeechBrain团队不保证该模型在其他数据集上的性能。

引用SpeechBrain

如果您使用了本项目，请引用以下文献：

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
  }