wav2vec2-base-10k-voxpopuli-ft-de Open-source Speech Recognition Model - Accurately Handle German Speech Transcription

Wav2vec2 Base 10k Voxpopuli Ft De

Developed by facebook

A speech recognition model based on Facebook's Wav2Vec2 base model, pretrained on a 10K-hour unlabeled subset of the VoxPopuli corpus and fine-tuned on German transcription data

Speech Recognition

Transformers

German#German Speech Recognition #VoxPopuli Pretraining #Low-Resource Optimization

Downloads 46

Release Time : 3/2/2022

Model Overview

This model is a German Automatic Speech Recognition (ASR) system capable of converting German speech into text. Built on the Wav2Vec2 architecture, it achieves high-performance speech recognition through large-scale unsupervised pretraining and supervised fine-tuning.

Model Features

Large-Scale Pretraining

Pretrained on 10K hours of unlabeled data from the VoxPopuli corpus, learning rich speech representations

German Optimization

Specifically fine-tuned for German speech data, excelling in German speech recognition tasks

End-to-End Learning

Learns speech features directly from raw audio without requiring manually designed feature extractors

Model Capabilities

German Speech Recognition

Audio-to-Text Conversion

Speech Transcription

Use Cases

Speech Transcription

Meeting Minutes Automation

Automatically converts German meeting recordings into text transcripts

Voice Assistants

Provides speech recognition capabilities for German voice assistants

Accessibility Technology

Real-Time Caption Generation

Generates real-time captions for German video content

🚀 Wav2Vec2-Base-VoxPopuli-Finetuned

This is a fine - tuned base model of Facebook's Wav2Vec2. It was pretrained on the 10K unlabeled subset of VoxPopuli corpus and fine - tuned on the transcribed data in German.

✨ Features

Audio Processing: Specialized for audio tasks, particularly automatic speech recognition.
Multilingual Adaptability: Based on a large - scale multilingual speech corpus (VoxPopuli).

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

#!/usr/bin/env python3
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torchaudio
import torch

# resample audio

# load model & processor
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-de")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-de")

# load dataset
ds = load_dataset("common_voice", "de", split="validation[:1%]")

# common voice does not match target sampling rate
common_voice_sample_rate = 48000
target_sample_rate = 16000

resampler = torchaudio.transforms.Resample(common_voice_sample_rate, target_sample_rate)


# define mapping fn to read in sound file and resample
def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    speech = resampler(speech)
    batch["speech"] = speech[0]
    return batch


# load all audio files
ds = ds.map(map_to_array)

# run inference on the first 5 data samples
inputs = processor(ds[:5]["speech"], sampling_rate=target_sample_rate, return_tensors="pt", padding=True)

# inference
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, axis=-1)

print(processor.batch_decode(predicted_ids))

📚 Documentation

Paper: VoxPopuli: A Large - Scale Multilingual Speech Corpus for Representation Learning, Semi - Supervised Learning and Interpretation
Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI
See the official website for more information, here

📄 License

This project is licensed under the CC - BY - NC - 4.0 license.

Property	Details
Model Type	Fine - tuned Wav2Vec2 base model
Training Data	10K unlabeled subset of VoxPopuli corpus and transcribed German data

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご