wav2vec2-base-10k-voxpopuli-ft-nl Open Source Speech Recognition Model

Wav2vec2 Base 10k Voxpopuli Ft Nl

Developed by facebook

A speech recognition model based on Facebook's Wav2Vec2 architecture, pretrained on 10K hours of unlabeled Dutch data from the VoxPopuli corpus and fine-tuned on Dutch transcription data.

Speech Recognition

Transformers

Other#Dutch speech recognition #Multilingual pretraining #VoxPopuli fine-tuning

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) system specifically optimized for Dutch, capable of converting Dutch speech into text.

Model Features

Multi-stage Training

Pretrained on 10K hours of unlabeled VoxPopuli data, then fine-tuned on labeled Dutch data

Dutch Optimization

Specifically optimized for Dutch speech characteristics for better recognition performance

Wav2Vec2 Architecture

Utilizes Facebook's advanced Wav2Vec2 speech processing architecture

Model Capabilities

Dutch speech recognition

Audio-to-text conversion

Automatic speech transcription

Use Cases

Speech Transcription

Meeting Minutes Automation

Automatically transcribe Dutch meeting recordings into written records

Voice Assistants

Provide speech recognition capabilities for Dutch voice assistants

Accessibility Technology

Real-time Caption Generation

Generate real-time captions for Dutch video content

🚀 Wav2Vec2-Base-VoxPopuli-Finetuned

This project is based on Facebook's Wav2Vec2. The base model is pretrained on the 10K unlabeled subset of the VoxPopuli corpus and fine-tuned on the transcribed data in Dutch (refer to Table 1 of the paper for more information). It is designed for audio processing and automatic speech recognition tasks.

✨ Features

Multilingual Adaptability: Leveraging the VoxPopuli corpus, it shows potential in multilingual speech processing.
Fine - Tuned for Dutch: Specifically optimized for the Dutch language, enhancing recognition accuracy.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you need to have the necessary Python libraries installed, such as transformers, datasets, torchaudio, and torch. You can install them using pip:

pip install transformers datasets torchaudio torch

💻 Usage Examples

Basic Usage

#!/usr/bin/env python3
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torchaudio
import torch

# resample audio

# load model & processor
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-nl")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-nl")

# load dataset
ds = load_dataset("common_voice", "nl", split="validation[:1%]")

# common voice does not match target sampling rate
common_voice_sample_rate = 48000
target_sample_rate = 16000

resampler = torchaudio.transforms.Resample(common_voice_sample_rate, target_sample_rate)


# define mapping fn to read in sound file and resample
def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    speech = resampler(speech)
    batch["speech"] = speech[0]
    return batch


# load all audio files
ds = ds.map(map_to_array)

# run inference on the first 5 data samples
inputs = processor(ds[:5]["speech"], sampling_rate=target_sample_rate, return_tensors="pt", padding=True)

# inference
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, axis=-1)

print(processor.batch_decode(predicted_ids))

📚 Documentation

Paper: VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI

See the official website for more information, here

📄 License

The project is licensed under the CC - BY - NC - 4.0 license.

Property	Details
Model Type	Wav2Vec2 - Base - VoxPopuli - Finetuned
Training Data	10K unlabeled subset of VoxPopuli corpus and transcribed Dutch data
License	CC - BY - NC - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご