🚀 Wav2Vec2-Large-LV60-TIMIT
本項目是在 timit_asr 數據集 上對 facebook/wav2vec2-large-lv60 進行微調後的成果。使用此模型時,請確保輸入的語音採樣率為 16kHz。
🚀 快速開始
本模型可直接使用(無需語言模型),具體操作如下:
💻 使用示例
基礎用法
import soundfile as sf
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model_name = "hktayal345/wav2vec2-large-lv60-timit-asr"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)
model.eval()
dataset = load_dataset("timit_asr", split="test").shuffle().select(range(10))
char_translations = str.maketrans({"-": " ", ",": "", ".": "", "?": ""})
def prepare_example(example):
example["speech"], _ = sf.read(example["file"])
example["text"] = example["text"].translate(char_translations)
example["text"] = " ".join(example["text"].split())
example["text"] = example["text"].lower()
return example
dataset = dataset.map(prepare_example, remove_columns=["file"])
inputs = processor(dataset["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")
with torch.no_grad():
predicted_ids = torch.argmax(model(inputs.input_values).logits, dim=-1)
predicted_ids[predicted_ids == -100] = processor.tokenizer.pad_token_id
predicted_transcripts = processor.tokenizer.batch_decode(predicted_ids)
for reference, predicted in zip(dataset["text"], predicted_transcripts):
print("reference:", reference)
print("predicted:", predicted)
print("--")
以下是輸出示例:
reference: the emblem depicts the acropolis all aglow
predicted: the amblum depicts the acropolis all a glo
--
reference: don't ask me to carry an oily rag like that
predicted: don't ask me to carry an oily rag like that
--
reference: they enjoy it when i audition
predicted: they enjoy it when i addition
--
reference: set aside to dry with lid on sugar bowl
predicted: set aside to dry with a litt on shoogerbowl
--
reference: a boring novel is a superb sleeping pill
predicted: a bor and novel is a suberb sleeping peel
--
reference: only the most accomplished artists obtain popularity
predicted: only the most accomplished artists obtain popularity
--
reference: he has never himself done anything for which to be hated which of us has
predicted: he has never himself done anything for which to be hated which of us has
--
reference: the fish began to leap frantically on the surface of the small lake
predicted: the fish began to leap frantically on the surface of the small lake
--
reference: or certain words or rituals that child and adult go through may do the trick
predicted: or certain words or rituals that child an adult go through may do the trick
--
reference: are your grades higher or lower than nancy's
predicted: are your grades higher or lower than nancies
--
📚 詳細文檔
微調腳本
你可以在 此處 找到用於生成此模型的腳本。
注意:該模型可以進一步微調;trainer_state.json 顯示了有用的詳細信息,即最後一個狀態(此檢查點):
{
"epoch": 29.51,
"eval_loss": 25.424150466918945,
"eval_runtime": 182.9499,
"eval_samples_per_second": 9.183,
"eval_wer": 0.1351704233095107,
"step": 8500
}
📄 許可證
本項目採用 Apache-2.0 許可證。
屬性 |
詳情 |
數據集 |
timit_asr |
標籤 |
音頻、自動語音識別、語音 |
許可證 |
Apache-2.0 |