wav2vec2-large-xlsr-53-french_punctuationオープンソースモデル - 句読点予測をサポートするフランス語音声認識

ホーム

Wav2vec2 Large Xlsr 53 French Punctuation

Ilyesによって開発

wav2vec2-large-xlsr-53アーキテクチャに基づくフランス語自動音声認識モデルで、句読点予測をサポート

音声認識フランス語オープンソースライセンス:Apache-2.0 #フランス語音声認識 #句読点自動生成 #XLSRファインチューニング

ダウンロード数 23

リリース時間 : 3/2/2022

モデル概要

このモデルはフランス語音声認識専用に設計されたwav2vec2-large-xlsr-53のファインチューン版で、句読点を含む音声転記タスクを処理可能

モデル特徴

句読点予測

自動的に句読点を予測・追加し、転記テキストの可読性を向上

高精度

Common Voiceフランス語テストセットでWER 19.47%、CER 6.66%の優れた性能を達成

XLSRファインチューニング

多言語音声表現(XLSR)事前学習モデルに基づくファインチューニングで、強力な音声特徴抽出能力を有する

モデル能力

フランス語音声認識

自動句読点予測

音声テキスト変換

使用事例

音声転記

会議議事録

フランス語会議録音を自動転記し句読点を追加

転記効率とテキスト可読性の向上

メディア字幕生成

フランス語動画コンテンツ向けに句読点付き字幕を生成

手作業字幕作成時間の削減

音声アシスタント

フランス語音声入力

フランス語音声コマンドの認識・処理をサポート

音声インタラクション体験の向上

🚀 wav2vec2-large-xlsr-53-French_punctuationによる音声認識モデル

このモデルは、自動音声認識（ASR）タスクに特化したモデルで、Common Voiceのフランス語データセットを使用して微調整されています。テキストと句読点の予測において良好な性能を示しています。

🚀 クイックスタート

モデルの評価コード

以下のコードは、Common Voiceのフランス語テストデータセットでモデルを評価する方法を示しています。

import re
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import (
    Wav2Vec2ForCTC,
    Wav2Vec2Processor,
)

model_name = "Ilyes/wav2vec2-large-xlsr-53-french_punctuation"

model = Wav2Vec2ForCTC.from_pretrained(model_name).to('cuda')
processor = Wav2Vec2Processor.from_pretrained(model_name)

ds = load_dataset("common_voice", "fr", split="test")

chars_to_ignore_regex = '[\;\:\"\“\%\‘\”\�\‘\’\’\’\‘\…\·\ǃ\«\‹\»\›“\”\\ʿ\ʾ\„\∞\\|\;\:\*\—\–\─\―\_\/\:\ː\;\=\«\»\→]'
def normalize_text(text):
    text = text.lower().strip()
    text = re.sub('œ', 'oe', text)
    text = re.sub('æ', 'ae', text)
    text = re.sub("’|´|′|ʼ|‘|ʻ|`", "'", text)
    text = re.sub("'+ ", " ", text)
    text = re.sub(" '+", " ", text)
    text = re.sub("'$", " ", text)
    text = re.sub("' ", " ", text)
    text = re.sub("−|‐", "-", text)
    text = re.sub(" -", "", text)
    text = re.sub("- ", "", text)
    text = re.sub(chars_to_ignore_regex, '', text)
    return text

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = normalize_text(batch["sentence"])
    return batch

ds = ds.map(map_to_array)

resampler = torchaudio.transforms.Resample(48_000, 16_000)
def map_to_pred(batch):
    features = processor(batch["speech"], sampling_rate=batch["sampling_rate"][0], padding=True, return_tensors="pt")
    input_values = features.input_values.to(device)
    attention_mask = features.attention_mask.to(device)
    with torch.no_grad():
        logits = model(input_values, attention_mask=attention_mask).logits
    pred_ids = torch.argmax(logits, dim=-1)
    batch["predicted"] = processor.batch_decode(pred_ids)
    batch["target"] = batch["sentence"]
    # remove duplicates
    batch["target"] = re.sub('\.+', '.', batch["target"])
    batch["target"] = re.sub('\?+', '?', batch["target"])
    batch["target"] = re.sub('!+', '!', batch["target"])
    batch["target"] = re.sub(',+', ',', batch["target"])
    return batch

result = ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=list(ds.features.keys()))
wer = load_metric("wer")
print(wer.compute(predictions=result["predicted"], references=result["target"]))

💻 使用例

基本的な使用法

上記のコードは、モデルを評価するための基本的な使用法を示しています。

📚 詳細ドキュメント

評価結果の例

参照文	予測文
il vécut à new york et y enseigna une grande partie de sa vie.	il a vécu à new york et y enseigna une grande partie de sa vie.
au classement par nations, l'allemagne est la tenante du titre.	au classement der nation l'allemagne est la tenante du titre.
voici un petit calcul pour fixer les idées.	voici un petit calcul pour fixer les idées.
oh! tu dois être beau avec	oh! tu dois être beau avec.
babochet vous le voulez?	baboche, vous le voulez?
la commission est, par conséquent, défavorable à cet amendement.	la commission est, par conséquent, défavorable à cet amendement.