wav2vec2-large-lv60-timit-asr開源語音識別模型 - 精準識別助力語音信息處理

Home

Wav2vec2 Large Lv60 Timit Asr

Developed by elgeish

基於facebook/wav2vec2-large-lv60模型，在timit_asr數據集上微調的語音識別模型

語音識別 EnglishOpen Source License:Apache-2.0 #英語語音識別 #高精度ASR #TIMIT數據集

Downloads 13

Release Time : 3/2/2022

Model Overview

這是一個用於自動語音識別(ASR)的模型，特別針對英語語音識別任務進行了優化。

Model Features

高精度語音識別

在TIMIT數據集上實現了13.5%的詞錯誤率(WER)

無需語言模型

可直接使用，無需額外的語言模型支持

16kHz採樣率支持

專為16kHz採樣率的語音輸入優化

Model Capabilities

英語語音轉文本

連續語音識別

說話人無關識別

Use Cases

語音轉錄

語音筆記轉錄

將英語語音筆記自動轉換為文本

準確率約86.5%

會議記錄

自動生成會議語音的文字記錄

語音接口

語音命令識別

識別用戶語音命令

🚀 Wav2Vec2-Large-LV60-TIMIT

Wav2Vec2-Large-LV60-TIMIT 是在 timit_asr 數據集上對 facebook/wav2vec2-large-lv60 進行微調得到的模型。使用該模型時，請確保輸入的語音採樣率為 16kHz。

🚀 快速開始

本模型可直接使用（無需語言模型），以下是使用示例：

💻 使用示例

基礎用法

import soundfile as sf
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

model_name = "elgeish/wav2vec2-large-lv60-timit-asr"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)
model.eval()

dataset = load_dataset("timit_asr", split="test").shuffle().select(range(10))
char_translations = str.maketrans({"-": " ", ",": "", ".": "", "?": ""})

def prepare_example(example):
    example["speech"], _ = sf.read(example["file"])
    example["text"] = example["text"].translate(char_translations)
    example["text"] = " ".join(example["text"].split())  # clean up whitespaces
    example["text"] = example["text"].lower()
    return example

dataset = dataset.map(prepare_example, remove_columns=["file"])
inputs = processor(dataset["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")

with torch.no_grad():
    predicted_ids = torch.argmax(model(inputs.input_values).logits, dim=-1)
predicted_ids[predicted_ids == -100] = processor.tokenizer.pad_token_id  # see fine-tuning script
predicted_transcripts = processor.tokenizer.batch_decode(predicted_ids)

for reference, predicted in zip(dataset["text"], predicted_transcripts):
    print("reference:", reference)
    print("predicted:", predicted)
    print("--")

以下是輸出示例：

reference: the emblem depicts the acropolis all aglow
predicted: the amblum depicts the acropolis all a glo
--
reference: don't ask me to carry an oily rag like that
predicted: don't ask me to carry an oily rag like that
--
reference: they enjoy it when i audition
predicted: they enjoy it when i addition
--
reference: set aside to dry with lid on sugar bowl
predicted: set aside to dry with a litt on shoogerbowl
--
reference: a boring novel is a superb sleeping pill
predicted: a bor and novel is a suberb sleeping peel
--
reference: only the most accomplished artists obtain popularity
predicted: only the most accomplished artists obtain popularity
--
reference: he has never himself done anything for which to be hated which of us has
predicted: he has never himself done anything for which to be hated which of us has
--
reference: the fish began to leap frantically on the surface of the small lake
predicted: the fish began to leap frantically on the surface of the small lake
--
reference: or certain words or rituals that child and adult go through may do the trick
predicted: or certain words or rituals that child an adult go through may do the trick
--
reference: are your grades higher or lower than nancy's
predicted: are your grades higher or lower than nancies
--

📚 詳細文檔

微調腳本

你可以在此處找到用於訓練此模型的腳本。

⚠️ 重要提示

該模型可以進一步微調；trainer_state.json 顯示了有用的詳細信息，即最後一個狀態（此檢查點）：

{
    "epoch": 29.51,
    "eval_loss": 25.424150466918945,
    "eval_runtime": 182.9499,
    "eval_samples_per_second": 9.183,
    "eval_wer": 0.1351704233095107,
    "step": 8500
}