Wav2vec2 Large Ru Golos With Lm

由bond005開發

這是一個基於facebook/wav2vec2-large-xlsr-53微調的俄語語音識別模型，使用Sberdevices Golos數據集訓練，並集成了2-gram語言模型以提高識別準確率。

語音識別

Transformers

其他開源協議:Apache-2.0 #俄語語音識別 #低詞錯誤率 #遠場語音處理

下載量 434

發布時間 : 9/26/2022

模型概述

該模型專門用於俄語語音識別任務，支持16kHz採樣率的音頻輸入，在多個俄語測試集上表現出色。

模型特點

集成語言模型

集成了基於俄語文本語料庫構建的2-gram語言模型，顯著提高了識別準確率

數據增強訓練

訓練時應用了音高變換、聲音加速/減速、混響等音頻增強技術，提高了模型魯棒性

多數據集評估

在Sberdevices Golos、Common Voice俄語等多個測試集上進行了全面評估

模型能力

俄語語音識別

音頻轉錄

語音轉文本

使用案例

語音助手

智能家居控制

用於俄語智能家居設備的語音指令識別

在遠場測試集上CER為5.128%

語音轉錄

會議記錄轉錄

將俄語會議錄音自動轉錄為文字

在眾包測試集上WER為6.883%

language: zh datasets:

SberDevices/Golos
common_voice
bond005/rulibrispeech
bond005/sova_rudevices metrics:
詞錯誤率(WER)
字錯誤率(CER) tags:
音頻
自動語音識別
語音
common_voice
SberDevices/Golos
bond005/rulibrispeech
bond005/sova_rudevices
dangrebenkin/voxforge-ru-dataset license: apache-2.0 widget:
example_title: 俄語測試音頻"нейросети это хорошо"（意為"神經網絡很棒"） src: https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm/resolve/main/test_sound_ru.flac model-index:
name: Ivan Bondarenko開發的帶語言模型的XLSR Wav2Vec2俄語模型 results:
- task: name: 語音識別 type: 自動語音識別 dataset: name: Sberdevices Golos (眾包) type: SberDevices/Golos args: ru metrics:
  - name: 測試WER type: wer value: 6.883
  - name: 測試CER type: cer value: 1.637
- task: name: 語音識別 type: 自動語音識別 dataset: name: Sberdevices Golos (遠場) type: SberDevices/Golos args: ru metrics:
  - name: 測試WER type: wer value: 15.044
  - name: 測試CER type: cer value: 5.128
- task: name: 自動語音識別 type: 自動語音識別 dataset: name: Common Voice俄語 type: common_voice args: ru metrics:
  - name: 測試WER type: wer value: 12.115
  - name: 測試CER type: cer value: 2.980
- task: name: 自動語音識別 type: 自動語音識別 dataset: name: 俄語Librispeech type: bond005/rulibrispeech args: ru metrics:
  - name: 測試WER type: wer value: 15.736
  - name: 測試CER type: cer value: 3.573
- task: name: 自動語音識別 type: 自動語音識別 dataset: name: Sova俄語設備 type: bond005/sova_rudevices args: ru metrics:
  - name: 測試WER type: wer value: 20.652
  - name: 測試CER type: cer value: 7.287
- task: name: 自動語音識別 type: 自動語音識別 dataset: name: Voxforge俄語 type: dangrebenkin/voxforge-ru-dataset args: ru metrics:
  - name: 測試WER type: wer value: 19.079
  - name: 測試CER type: cer value: 5.864

Wav2Vec2大型俄語Golos帶語言模型

該Wav2Vec2模型基於facebook/wav2vec2-large-xlsr-53，使用Sberdevices Golos俄語數據集進行微調，並應用了音高變換、聲音加速/減速、混響等音頻增強技術。

該2-gram語言模型基於從三個公開來源獲取的俄語文本語料庫構建：

使用說明

使用本模型時，請確保語音輸入採樣率為16kHz。

您可以通過編寫自定義推理腳本來使用該模型：

import os
import warnings

import librosa
import nltk
import numpy as np

import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM

MODEL_ID = "bond005/wav2vec2-large-ru-golos-with-lm"
DATASET_ID = "bond005/sberdevices_golos_10h_crowd"
SAMPLES = 30

nltk.download('punkt')
num_processes = max(1, os.cpu_count())

test_dataset = load_dataset(DATASET_ID, split=f"test[:{SAMPLES}]")
processor = Wav2Vec2ProcessorWithLM.from_pretrained(MODEL_ID)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)

# 數據集預處理
# 需要將音頻文件讀取為數組
def speech_file_to_array_fn(batch):
    speech_array = batch["audio"]["array"]
    batch["speech"] = np.asarray(speech_array, dtype=np.float32)
    return batch

removed_columns = set(test_dataset.column_names)
removed_columns -= {'transcription', 'speech'}
removed_columns = sorted(list(removed_columns))
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    test_dataset = test_dataset.map(
        speech_file_to_array_fn,
        num_proc=num_processes,
        remove_columns=removed_columns
    )

inputs = processor(test_dataset["speech"], sampling_rate=16_000,
                   return_tensors="pt", padding=True)
with torch.no_grad():
    logits = model(inputs.input_values,
                   attention_mask=inputs.attention_mask).logits
predicted_sentences = processor.batch_decode(
    logits=logits.numpy(),
    num_processes=num_processes
).text

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for i, predicted_sentence in enumerate(predicted_sentences):
        print("-" * 100)
        print("參考文本:", test_dataset[i]["transcription"])
        print("預測結果:", predicted_sentence)

也可通過此Colab腳本使用Google Colab版本。

評估

本模型在SberDevices Golos、Common Voice 6.0（俄語部分）和俄語Librispeech的測試子集上進行了評估，但僅使用SberDevices Golos的訓練子集進行訓練。您可以在我的Kaggle頁面上查看其他數據集（包括俄語Librispeech和SOVA俄語設備）的評估腳本：https://www.kaggle.com/code/bond005/wav2vec2-ru-lm-eval

引用

如需引用本模型，請使用：

@misc{bondarenko2022wav2vec2-large-ru-golos,
  title={Ivan Bondarenko開發的帶2-gram語言模型的XLSR Wav2Vec2俄語模型},
  author={Bondarenko, Ivan},
  publisher={Hugging Face},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm}},
  year={2022}
}