language: zh
datasets:
- SberDevices/Golos
- common_voice
- bond005/rulibrispeech
- bond005/sova_rudevices
metrics:
- 詞錯誤率(WER)
- 字錯誤率(CER)
tags:
- 音頻
- 自動語音識別
- 語音
- common_voice
- SberDevices/Golos
- bond005/rulibrispeech
- bond005/sova_rudevices
- dangrebenkin/voxforge-ru-dataset
license: apache-2.0
widget:
- example_title: 俄語測試音頻"нейросети это хорошо"(意為"神經網絡很棒")
src: https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm/resolve/main/test_sound_ru.flac
model-index:
- name: Ivan Bondarenko開發的帶語言模型的XLSR Wav2Vec2俄語模型
results:
- task:
name: 語音識別
type: 自動語音識別
dataset:
name: Sberdevices Golos (眾包)
type: SberDevices/Golos
args: ru
metrics:
- name: 測試WER
type: wer
value: 6.883
- name: 測試CER
type: cer
value: 1.637
- task:
name: 語音識別
type: 自動語音識別
dataset:
name: Sberdevices Golos (遠場)
type: SberDevices/Golos
args: ru
metrics:
- name: 測試WER
type: wer
value: 15.044
- name: 測試CER
type: cer
value: 5.128
- task:
name: 自動語音識別
type: 自動語音識別
dataset:
name: Common Voice俄語
type: common_voice
args: ru
metrics:
- name: 測試WER
type: wer
value: 12.115
- name: 測試CER
type: cer
value: 2.980
- task:
name: 自動語音識別
type: 自動語音識別
dataset:
name: 俄語Librispeech
type: bond005/rulibrispeech
args: ru
metrics:
- name: 測試WER
type: wer
value: 15.736
- name: 測試CER
type: cer
value: 3.573
- task:
name: 自動語音識別
type: 自動語音識別
dataset:
name: Sova俄語設備
type: bond005/sova_rudevices
args: ru
metrics:
- name: 測試WER
type: wer
value: 20.652
- name: 測試CER
type: cer
value: 7.287
- task:
name: 自動語音識別
type: 自動語音識別
dataset:
name: Voxforge俄語
type: dangrebenkin/voxforge-ru-dataset
args: ru
metrics:
- name: 測試WER
type: wer
value: 19.079
- name: 測試CER
type: cer
value: 5.864
Wav2Vec2大型俄語Golos帶語言模型
該Wav2Vec2模型基於facebook/wav2vec2-large-xlsr-53,使用Sberdevices Golos俄語數據集進行微調,並應用了音高變換、聲音加速/減速、混響等音頻增強技術。
該2-gram語言模型基於從三個公開來源獲取的俄語文本語料庫構建:
使用說明
使用本模型時,請確保語音輸入採樣率為16kHz。
您可以通過編寫自定義推理腳本來使用該模型:
import os
import warnings
import librosa
import nltk
import numpy as np
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
MODEL_ID = "bond005/wav2vec2-large-ru-golos-with-lm"
DATASET_ID = "bond005/sberdevices_golos_10h_crowd"
SAMPLES = 30
nltk.download('punkt')
num_processes = max(1, os.cpu_count())
test_dataset = load_dataset(DATASET_ID, split=f"test[:{SAMPLES}]")
processor = Wav2Vec2ProcessorWithLM.from_pretrained(MODEL_ID)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
def speech_file_to_array_fn(batch):
speech_array = batch["audio"]["array"]
batch["speech"] = np.asarray(speech_array, dtype=np.float32)
return batch
removed_columns = set(test_dataset.column_names)
removed_columns -= {'transcription', 'speech'}
removed_columns = sorted(list(removed_columns))
with warnings.catch_warnings():
warnings.simplefilter("ignore")
test_dataset = test_dataset.map(
speech_file_to_array_fn,
num_proc=num_processes,
remove_columns=removed_columns
)
inputs = processor(test_dataset["speech"], sampling_rate=16_000,
return_tensors="pt", padding=True)
with torch.no_grad():
logits = model(inputs.input_values,
attention_mask=inputs.attention_mask).logits
predicted_sentences = processor.batch_decode(
logits=logits.numpy(),
num_processes=num_processes
).text
with warnings.catch_warnings():
warnings.simplefilter("ignore")
for i, predicted_sentence in enumerate(predicted_sentences):
print("-" * 100)
print("參考文本:", test_dataset[i]["transcription"])
print("預測結果:", predicted_sentence)
也可通過此Colab腳本使用Google Colab版本。
評估
本模型在SberDevices Golos、Common Voice 6.0(俄語部分)和俄語Librispeech的測試子集上進行了評估,但僅使用SberDevices Golos的訓練子集進行訓練。您可以在我的Kaggle頁面上查看其他數據集(包括俄語Librispeech和SOVA俄語設備)的評估腳本:https://www.kaggle.com/code/bond005/wav2vec2-ru-lm-eval
引用
如需引用本模型,請使用:
@misc{bondarenko2022wav2vec2-large-ru-golos,
title={Ivan Bondarenko開發的帶2-gram語言模型的XLSR Wav2Vec2俄語模型},
author={Bondarenko, Ivan},
publisher={Hugging Face},
journal={Hugging Face Hub},
howpublished={\url{https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm}},
year={2022}
}