wav2vec2-large-100k-voxpopuli葡萄牙语语音识别模型

首页

Wav2vec2 Large 100k Voxpopuli Ft Common Voice Plus TTS Dataset Plus Data Augmentation Portuguese

由 Edresson 开发

这是一个基于Facebook的Wav2vec2 Large 100k Voxpopuli模型，使用Common Voice 7.0和TTS葡萄牙语数据集进行微调，并应用了数据增强技术的葡萄牙语语音识别模型。

语音识别

Transformers

其他开源协议:Apache-2.0 #葡萄牙语语音识别 #数据增强优化 #多语料库训练

下载量 22

发布时间 : 3/2/2022

模型简介

该模型专注于葡萄牙语语音识别任务，通过数据增强和额外TTS数据集微调提高了识别准确率。

模型特点

数据增强微调

使用TTS生成数据和语音转换技术进行数据增强，提高了模型性能

多数据集训练

结合Common Voice 7.0和专门TTS葡萄牙语数据集进行训练

高性能识别

在Common Voice 7.0测试集上达到20.20%的词错误率

模型能力

葡萄牙语语音识别

音频转文本

自动语音识别

使用案例

语音转录

葡萄牙语语音转文字

将葡萄牙语语音内容转换为文字

词错误率20.20%

语音助手

葡萄牙语语音指令识别

用于葡萄牙语语音助手系统的语音指令识别

🚀 Wav2vec2 Large 100k Voxpopuli 在葡萄牙语上微调模型

该项目基于 Common Voice 7.0、TTS - 葡萄牙语语料库并结合数据增强方法，对 Wav2vec2 Large 100k Voxpopuli 模型进行微调，用于葡萄牙语的自动语音识别，有效提升了语音识别效果。

🚀 快速开始

本模型是 Wav2vec2 Large 100k Voxpopuli 在葡萄牙语上的微调版本，使用了 Common Voice 7.0、TTS - 葡萄牙语语料库，并结合了基于 TTS 和语音转换的数据增强方法。

💻 使用示例

基础用法

from transformers import AutoTokenizer, Wav2Vec2ForCTC
  
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common_Voice_plus_TTS-Dataset_plus_Data_Augmentation-portuguese")

model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common_Voice_plus_TTS-Dataset_plus_Data_Augmentation-portuguese")

高级用法

使用 Common Voice 数据集进行测试示例

dataset = load_dataset("common_voice", "ru", split="test", data_dir="./cv-corpus-7.0-2021-07-21")

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("â€™", "'")
    return batch

ds = dataset.map(map_to_array)
result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
print(wer.compute(predictions=result["predicted"], references=result["target"]))

📚 详细文档

关于模型的详细结果，请查看论文。

📄 许可证

本模型使用的许可证为 apache - 2.0。

📦 模型信息

属性	详情
模型类型	Wav2vec2 Large 100k Voxpopuli 在葡萄牙语上的微调模型
训练数据	Common Voice 7.0、TTS - 葡萄牙语语料库
评估指标	字错率（WER）
标签	音频、语音、wav2vec2、葡萄牙语、葡萄牙语语音语料库、自动语音识别、PyTorch