asr-wav2vec2-commonvoice-fr开源语音识别模型 - 免费识别法语语音，无需语言模型

首页

Asr Wav2vec2 Commonvoice Fr

由 speechbrain 开发

基于CommonVoice法语数据集训练的wav2vec 2.0语音识别模型，使用CTC/Attention架构，无需语言模型

语音识别

PyTorch

法语开源协议:Apache-2.0 #法语语音识别 #wav2vec2微调 #无语言模型

下载量 250

发布时间 : 3/2/2022

模型简介

这是一个端到端的法语自动语音识别系统，基于预训练的wav2vec 2.0模型微调，适用于法语语音转文本任务。

模型特点

预训练模型微调

基于LeBenchmark/wav2vec2-FR-7K-large预训练模型进行微调，提高了法语识别准确率

无需语言模型

系统直接使用CTC贪婪解码器，不需要额外的语言模型支持

高效训练

使用2块V100 32GB GPU即可完成训练，资源消耗相对较低

模型能力

法语语音识别

音频转录

16kHz采样率处理

使用案例

语音转文本

法语语音转录

将法语语音内容转换为文本

测试WER 9.96%，CER 3.19%

🚀 wav2vec 2.0 基于 CTC/Attention 在 CommonVoice 法语数据集上训练（无语言模型）

本仓库提供了在 SpeechBrain 中使用基于 CommonVoice（法语）预训练的端到端系统进行自动语音识别所需的所有工具。为获得更好的体验，我们建议您进一步了解 SpeechBrain。

模型信息

属性	详情
模型类型	自动语音识别
标签	CTC、pytorch、speechbrain、Transformer、hf - asr - leaderboard
许可证	apache - 2.0
训练数据集	commonvoice
评估指标	wer、cer

模型表现

发布时间	测试字符错误率（CER）	测试词错误率（WER）	GPU 配置
2021 年 8 月 24 日	3.19	9.96	2 块 V100 32GB

🚀 快速开始

本自动语音识别（ASR）系统由两个不同但相互关联的模块组成：

分词器（unigram）：将单词转换为子词单元，并使用 CommonVoice（法语）的训练转录文件（train.tsv）进行训练。
声学模型（wav2vec2.0 + CTC）：将预训练的 wav2vec 2.0 模型（[LeBenchmark/wav2vec2 - FR - 7K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - large)）与两个深度神经网络（DNN）层相结合，并在 CommonVoice 法语数据集上进行微调。最终得到的声学表示将输入到 CTC 贪心解码器中。

该系统使用采样率为 16kHz（单声道）的录音进行训练。在调用 transcribe_file 时，代码会根据需要自动对音频进行归一化处理（即重采样和选择单声道）。

📦 安装指南

首先，请使用以下命令安装 transformers 和 SpeechBrain：

pip install speechbrain transformers

建议您阅读我们的教程，进一步了解 SpeechBrain。

💻 使用示例

基础用法

对您自己的法语音频文件进行转录：

from speechbrain.inference.ASR import EncoderASR

asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-fr", savedir="pretrained_models/asr-wav2vec2-commonvoice-fr")
asr_model.transcribe_file('speechbrain/asr-wav2vec2-commonvoice-fr/example-fr.wav')

高级用法

在 GPU 上进行推理：在调用 from_hparams 方法时添加 run_opts={"device":"cuda"}。

from speechbrain.inference.ASR import EncoderASR

asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-fr", savedir="pretrained_models/asr-wav2vec2-commonvoice-fr", run_opts={"device":"cuda"})
asr_model.transcribe_file('speechbrain/asr-wav2vec2-commonvoice-fr/example-fr.wav')

🔧 技术细节

训练步骤

该模型使用 SpeechBrain 进行训练。若要从头开始训练，请按以下步骤操作：

克隆 SpeechBrain 仓库：

git clone https://github.com/speechbrain/speechbrain/

安装依赖：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练脚本：

cd recipes/CommonVoice/ASR/CTC/
python train_with_wav2vec.py hparams/train_fr_with_wav2vec.yaml --data_folder=your_data_folder

您可以在此处找到我们的训练结果（模型、日志等）。

局限性

SpeechBrain 团队不对该模型在其他数据集上的性能提供任何保证。

📚 详细文档

引用 SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }