wav2vec2-large-xlsr-53-german Open Source Model - Free Deployment for Automatic German Speech Recognition

Wav2vec2 Large Xlsr 53 German

Developed by facebook

Large-scale German automatic speech recognition (ASR) model based on Facebook's Wav2Vec2 architecture, fine-tuned on the Common Voice German dataset

Speech Recognition GermanOpen Source License:Apache-2.0 #German Speech Recognition #High Accuracy WER 18.5%#XLSR Multilingual Transfer

Downloads 1,767

Release Time : 3/2/2022

Model Overview

This model is a pre-trained model based on the Wav2Vec2 architecture, specifically fine-tuned for German speech recognition tasks, capable of converting German speech into text.

Model Features

Large-scale Pre-training

Pre-trained on the XLSR-53 multilingual model, with powerful speech feature extraction capabilities

German Optimization

Specifically fine-tuned for German speech characteristics, adapting to German pronunciation and grammar features

High Accuracy

Achieves a word error rate (WER) of 18.5% on the Common Voice German test set

Model Capabilities

German Speech Recognition

Speech-to-Text

Audio Content Transcription

Use Cases

Speech Transcription

German Speech-to-Text

Automatically convert German speech content into text format

Word error rate 18.5% (on Common Voice test set)

Assistive Technology

Voice Control Applications

Provide voice control interfaces for German users

🚀 Speech Recognition Model for German

This project focuses on automatic speech recognition for the German language. It uses the Wav2Vec2 model to evaluate on the Common Voice German test dataset.

🚀 Quick Start

Prerequisites

Install necessary libraries such as torchaudio, datasets, transformers, torch etc.

Evaluation on Common Voice DE Test

import torchaudio
from datasets import load_dataset, load_metric
from transformers import (
    Wav2Vec2ForCTC,
    Wav2Vec2Processor,
)
import torch
import re
import sys

model_name = "facebook/wav2vec2-large-xlsr-53-german"
device = "cuda"

chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"]'  # noqa: W605

model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2Processor.from_pretrained(model_name)

ds = load_dataset("common_voice", "de", split="test", data_dir="./cv-corpus-6.1-2020-12-11")

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("’", "'")
    return batch
    
ds = ds.map(map_to_array)

def map_to_pred(batch):
    features = processor(batch["speech"], sampling_rate=batch["sampling_rate"][0], padding=True, return_tensors="pt")
    input_values = features.input_values.to(device)
    attention_mask = features.attention_mask.to(device)
    with torch.no_grad():
        logits = model(input_values, attention_mask=attention_mask).logits
    pred_ids = torch.argmax(logits, dim=-1)
    batch["predicted"] = processor.batch_decode(pred_ids)
    batch["target"] = batch["sentence"]
    return batch
    
result = ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=list(ds.features.keys()))

wer = load_metric("wer")

print(wer.compute(predictions=result["predicted"], references=result["target"]))

Result

The Word Error Rate (WER) on the Common Voice German test dataset is 18.5%.

✨ Features

Language Support: Specifically designed for German speech recognition.
Model Utilization: Utilizes the Wav2Vec2 model which is pre - trained on a large - scale dataset.

📦 Installation

The installation steps are not provided in the original document.

💻 Usage Examples

Basic Usage

The above Python code demonstrates the basic usage of evaluating the Wav2Vec2 model on the Common Voice German test dataset.

📚 Documentation

The original document does not provide detailed documentation.

🔧 Technical Details

The original document does not provide in - depth technical details.

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご