đ Wav2Vec2-Large-XLSR-53-Finnish
This model is fine - tuned from facebook/wav2vec2-large-xlsr-53 on Finnish datasets, enabling accurate automatic speech recognition.
â ī¸ Important Note
This is an old model and should not be used anymore!! There are a lot better newer models available at our organization hub: Finnish - NLP/wav2vec2-xlsr-1b-finnish-lm-v2 and Finnish - NLP/wav2vec2-xlsr-300m-finnish-lm
đ Quick Start
This model is fine - tuned facebook/wav2vec2-large-xlsr-53 on Finnish using the Common Voice, CSS10 Finnish and Finnish parliament session 2 datasets.
When using this model, make sure that your speech input is sampled at 16kHz.
⨠Features
- Audio Processing: Capable of handling audio data for speech recognition tasks.
- Fine - Tuned: Specifically fine - tuned on Finnish datasets for better performance in Finnish speech recognition.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import librosa
import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
test_dataset = load_dataset("common_voice", "fi", split="test[:2%]")
processor = Wav2Vec2Processor.from_pretrained("aapot/wav2vec2-large-xlsr-53-finnish")
model = Wav2Vec2ForCTC.from_pretrained("aapot/wav2vec2-large-xlsr-53-finnish")
resampler = lambda sr, y: librosa.resample(y.numpy().squeeze(), sr, 16_000)
def speech_file_to_array_fn(batch):
speech_array, sampling_rate = torchaudio.load(batch["path"])
batch["speech"] = resampler(sampling_rate, speech_array).squeeze()
return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
predicted_ids = torch.argmax(logits, dim=-1)
print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset["sentence"][:2])
Advanced Usage
import librosa
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = load_dataset("common_voice", "fi", split="test")
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("aapot/wav2vec2-large-xlsr-53-finnish")
model = Wav2Vec2ForCTC.from_pretrained("aapot/wav2vec2-large-xlsr-53-finnish")
model.to("cuda")
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\â\%\â\â\īŋŊ\'\...\âĻ\â\Ê]'
resampler = lambda sr, y: librosa.resample(y.numpy().squeeze(), sr, 16_000)
def speech_file_to_array_fn(batch):
batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
speech_array, sampling_rate = torchaudio.load(batch["path"])
batch["speech"] = resampler(sampling_rate, speech_array).squeeze()
return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
def evaluate(batch):
inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)
batch["pred_strings"] = processor.batch_decode(pred_ids)
return batch
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
Test Result: 32.378771 %
đ Documentation
Training
The Common Voice train
, validation
and other
datasets were used for training as well as CSS10 Finnish
and Finnish parliament session 2
datasets.
The script used for training can be found from Google Colab
đ License
This model is licensed under the apache - 2.0 license.
đ Information Table
Property |
Details |
Model Type |
Wav2Vec2 - Large - XLSR - 53 - Finnish |
Training Data |
Common Voice (train, validation, other), CSS10 Finnish, Finnish parliament session 2 |
Metrics |
WER (Word Error Rate) |
Tags |
audio, automatic - speech - recognition, speech, xlsr - fine - tuning - week |
License |
apache - 2.0 |