🚀 Wav2Vec2-Base-VoxPopuli-Finetuned
This project is based on Facebook's Wav2Vec2. The base model is pretrained on the 10K unlabeled subset of the VoxPopuli corpus and fine-tuned on the transcribed data in Dutch (refer to Table 1 of the paper for more information). It is designed for audio processing and automatic speech recognition tasks.
✨ Features
- Multilingual Adaptability: Leveraging the VoxPopuli corpus, it shows potential in multilingual speech processing.
- Fine - Tuned for Dutch: Specifically optimized for the Dutch language, enhancing recognition accuracy.
📦 Installation
No specific installation steps are provided in the original README. If you want to use this model, you need to have the necessary Python libraries installed, such as transformers
, datasets
, torchaudio
, and torch
. You can install them using pip
:
pip install transformers datasets torchaudio torch
💻 Usage Examples
Basic Usage
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torchaudio
import torch
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-nl")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-nl")
ds = load_dataset("common_voice", "nl", split="validation[:1%]")
common_voice_sample_rate = 48000
target_sample_rate = 16000
resampler = torchaudio.transforms.Resample(common_voice_sample_rate, target_sample_rate)
def map_to_array(batch):
speech, _ = torchaudio.load(batch["path"])
speech = resampler(speech)
batch["speech"] = speech[0]
return batch
ds = ds.map(map_to_array)
inputs = processor(ds[:5]["speech"], sampling_rate=target_sample_rate, return_tensors="pt", padding=True)
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, axis=-1)
print(processor.batch_decode(predicted_ids))
📚 Documentation
Paper: VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
Learning, Semi-Supervised Learning and Interpretation
Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI
See the official website for more information, here
📄 License
The project is licensed under the CC - BY - NC - 4.0 license.
Property |
Details |
Model Type |
Wav2Vec2 - Base - VoxPopuli - Finetuned |
Training Data |
10K unlabeled subset of VoxPopuli corpus and transcribed Dutch data |
License |
CC - BY - NC - 4.0 |