wav2vec2-large-100k-voxpopuli葡萄牙語語音識別模型

首頁

Wav2vec2 Large 100k Voxpopuli Ft Common Voice Plus TTS Dataset Plus Data Augmentation Portuguese

由Edresson開發

這是一個基於Facebook的Wav2vec2 Large 100k Voxpopuli模型，使用Common Voice 7.0和TTS葡萄牙語數據集進行微調，並應用了數據增強技術的葡萄牙語語音識別模型。

語音識別

Transformers

其他開源協議:Apache-2.0 #葡萄牙語語音識別 #數據增強優化 #多語料庫訓練

下載量 22

發布時間 : 3/2/2022

模型概述

該模型專注於葡萄牙語語音識別任務，通過數據增強和額外TTS數據集微調提高了識別準確率。

模型特點

數據增強微調

使用TTS生成數據和語音轉換技術進行數據增強，提高了模型性能

多數據集訓練

結合Common Voice 7.0和專門TTS葡萄牙語數據集進行訓練

高性能識別

在Common Voice 7.0測試集上達到20.20%的詞錯誤率

模型能力

葡萄牙語語音識別

音頻轉文本

自動語音識別

使用案例

語音轉錄

葡萄牙語語音轉文字

將葡萄牙語語音內容轉換為文字

詞錯誤率20.20%

語音助手

葡萄牙語語音指令識別

用於葡萄牙語語音助手系統的語音指令識別

🚀 Wav2vec2 Large 100k Voxpopuli 在葡萄牙語上微調模型

該項目基於 Common Voice 7.0、TTS - 葡萄牙語語料庫並結合數據增強方法，對 Wav2vec2 Large 100k Voxpopuli 模型進行微調，用於葡萄牙語的自動語音識別，有效提升了語音識別效果。

🚀 快速開始

本模型是 Wav2vec2 Large 100k Voxpopuli 在葡萄牙語上的微調版本，使用了 Common Voice 7.0、TTS - 葡萄牙語語料庫，並結合了基於 TTS 和語音轉換的數據增強方法。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, Wav2Vec2ForCTC
  
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common_Voice_plus_TTS-Dataset_plus_Data_Augmentation-portuguese")

model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common_Voice_plus_TTS-Dataset_plus_Data_Augmentation-portuguese")

高級用法

使用 Common Voice 數據集進行測試示例

dataset = load_dataset("common_voice", "ru", split="test", data_dir="./cv-corpus-7.0-2021-07-21")

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("â€™", "'")
    return batch

ds = dataset.map(map_to_array)
result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
print(wer.compute(predictions=result["predicted"], references=result["target"]))

📚 詳細文檔

關於模型的詳細結果，請查看論文。

📄 許可證

本模型使用的許可證為 apache - 2.0。

📦 模型信息

屬性	詳情
模型類型	Wav2vec2 Large 100k Voxpopuli 在葡萄牙語上的微調模型
訓練數據	Common Voice 7.0、TTS - 葡萄牙語語料庫
評估指標	字錯率（WER）
標籤	音頻、語音、wav2vec2、葡萄牙語、葡萄牙語語音語料庫、自動語音識別、PyTorch