đ Speech Recognition Model for German
This project focuses on automatic speech recognition for the German language. It uses the Wav2Vec2
model to evaluate on the Common Voice German test dataset.
đ Quick Start
Prerequisites
- Install necessary libraries such as
torchaudio
, datasets
, transformers
, torch
etc.
Evaluation on Common Voice DE Test
import torchaudio
from datasets import load_dataset, load_metric
from transformers import (
Wav2Vec2ForCTC,
Wav2Vec2Processor,
)
import torch
import re
import sys
model_name = "facebook/wav2vec2-large-xlsr-53-german"
device = "cuda"
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"]'
model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2Processor.from_pretrained(model_name)
ds = load_dataset("common_voice", "de", split="test", data_dir="./cv-corpus-6.1-2020-12-11")
resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
def map_to_array(batch):
speech, _ = torchaudio.load(batch["path"])
batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
batch["sampling_rate"] = resampler.new_freq
batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("â", "'")
return batch
ds = ds.map(map_to_array)
def map_to_pred(batch):
features = processor(batch["speech"], sampling_rate=batch["sampling_rate"][0], padding=True, return_tensors="pt")
input_values = features.input_values.to(device)
attention_mask = features.attention_mask.to(device)
with torch.no_grad():
logits = model(input_values, attention_mask=attention_mask).logits
pred_ids = torch.argmax(logits, dim=-1)
batch["predicted"] = processor.batch_decode(pred_ids)
batch["target"] = batch["sentence"]
return batch
result = ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=list(ds.features.keys()))
wer = load_metric("wer")
print(wer.compute(predictions=result["predicted"], references=result["target"]))
Result
The Word Error Rate (WER) on the Common Voice German test dataset is 18.5%.
⨠Features
- Language Support: Specifically designed for German speech recognition.
- Model Utilization: Utilizes the
Wav2Vec2
model which is pre - trained on a large - scale dataset.
đĻ Installation
The installation steps are not provided in the original document.
đģ Usage Examples
Basic Usage
The above Python code demonstrates the basic usage of evaluating the Wav2Vec2
model on the Common Voice German test dataset.
đ Documentation
The original document does not provide detailed documentation.
đ§ Technical Details
The original document does not provide in - depth technical details.
đ License
This project is licensed under the Apache 2.0 license.