đ Wav2Vec2-Base-VoxPopuli-Finetuned
This is a fine - tuned base model of Facebook's Wav2Vec2. It was pretrained on the 10K unlabeled subset of VoxPopuli corpus and fine - tuned on the transcribed data in Croatian (for more information, refer to Table 1 of the paper).
đ Quick Start
This model is designed for automatic speech recognition in Croatian. It leverages the power of Wav2Vec2 and the VoxPopuli corpus to provide accurate transcription results.
⨠Features
- Audio Processing: Specialized for audio data and automatic speech recognition.
- Multilingual Adaptability: Based on the VoxPopuli corpus, which supports multilingual speech processing.
- Fine - Tuned for Croatian: Optimized for the Croatian language, providing high - quality transcription.
đĻ Installation
The code example uses Python libraries such as transformers
, datasets
, and torchaudio
. You can install them using the following command:
pip install transformers datasets torchaudio torch
đģ Usage Examples
Basic Usage
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torchaudio
import torch
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-hr")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-hr")
ds = load_dataset("common_voice", "hr", split="validation[:1%]")
common_voice_sample_rate = 48000
target_sample_rate = 16000
resampler = torchaudio.transforms.Resample(common_voice_sample_rate, target_sample_rate)
def map_to_array(batch):
speech, _ = torchaudio.load(batch["path"])
speech = resampler(speech)
batch["speech"] = speech[0]
return batch
ds = ds.map(map_to_array)
inputs = processor(ds[:5]["speech"], sampling_rate=target_sample_rate, return_tensors="pt", padding=True)
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, axis=-1)
print(processor.batch_decode(predicted_ids))
đ Documentation
Paper: VoxPopuli: A Large - Scale Multilingual Speech Corpus for Representation Learning, Semi - Supervised Learning and Interpretation
Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI
For more information, please visit the official website: here
đ License
This model is released under the CC - BY - NC - 4.0 license.
Property |
Details |
Tags |
audio, automatic - speech - recognition, voxpopuli |
License |
cc - by - nc - 4.0 |