🚀 wav2vec2-large-xlsr-53-icelandic-ep30-967h
The "wav2vec2-large-xlsr-53-icelandic-ep30-967h" is an acoustic model designed for Automatic Speech Recognition in Icelandic. It's derived from fine - tuning the facebook/wav2vec2-large-xlsr-53 model over 30 epochs, using 967 hours of Icelandic data. This data was collected by the Language and Voice Laboratory via the Samrómur platform.
🚀 Quick Start
The "wav2vec2-large-xlsr-53-icelandic-ep30-967h" can be readily used for Icelandic Automatic Speech Recognition tasks. It's built upon fine - tuning a pre - trained model with a large amount of Icelandic data, ensuring high - quality performance.
✨ Features
- Icelandic - Specific: Tailored for Automatic Speech Recognition in Icelandic.
- Fine - Tuned: Based on facebook/wav2vec2-large-xlsr-53, fine - tuned with 967 hours of Icelandic data.
- Diverse Datasets: Trained on the Samrómur Milljón corpus.
📦 Installation
No explicit installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import Wav2Vec2Processor
from transformers import Wav2Vec2ForCTC
MODEL_NAME="language-and-voice-lab/wav2vec2-large-xlsr-53-icelandic-ep30-967h"
processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("language-and-voice-lab/samromur_children", split="test")
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
def prepare_dataset(batch):
audio = batch["audio"]
batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
with processor.as_target_processor():
batch["labels"] = processor(batch["normalized_text"]).input_ids
return batch
ds = ds.map(prepare_dataset, remove_columns=ds.column_names,num_proc=1)
import numpy as np
wer_metric = load_metric("wer")
def compute_metrics(pred):
pred_logits = pred.predictions
pred_ids = np.argmax(pred_logits, axis=-1)
pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
pred_str = processor.batch_decode(pred_ids)
label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
wer = wer_metric.compute(predictions=pred_str, references=label_str)
return {"wer": wer}
model = model.to(torch.device("cuda"))
def map_to_result(batch):
with torch.no_grad():
input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
logits = model(input_values).logits
pred_ids = torch.argmax(logits, dim=-1)
batch["pred_str"] = processor.batch_decode(pred_ids)[0]
batch["sentence"] = processor.decode(batch["labels"], group_tokens=False)
return batch
results = ds.map(map_to_result,remove_columns=ds.column_names)
print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["sentence"])))
Test Result: 0.076
📚 Documentation
Model Details
The model "wav2vec2-large-xlsr-53-icelandic-ep30-967h" is fine - tuned from facebook/wav2vec2-large-xlsr-53. The fine - tuning was carried out for 30 epochs using 967 hours of Icelandic data from the Samrómur Milljón corpus.
Evaluation Results
Dataset |
Split |
WER |
Samrómur |
Test |
7.698 |
Samrómur |
Dev |
6.786 |
Samrómur Children |
Test |
6.467 |
Samrómur Children |
Dev |
4.234 |
Malrómur |
Test |
6.631 |
Malrómur |
Dev |
5.836 |
Althingi |
Test |
17.904 |
Althingi |
Dev |
17.931 |
📄 License
This model is released under the CC - BY - 4.0 license.
📖 BibTeX entry and citation info
When publishing results based on these models please refer to:
@inproceedings{mena2024samromur,
title={Samr{\'o}mur Millj{\'o}n: An ASR Corpus of One Million Verified Read Prompts in Icelandic},
author={Mena, Carlos Daniel Hernandez and Gunnarsson, {\TH}orsteinn Da{\dh}i and Gu{\dh}nason, J{\'o}n},
booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages={14305--14312},
year={2024}
}
🙏 Acknowledgements
- Thanks to Jón Guðnason, head of the Language and Voice Lab, for providing computational power.
- Thanks to the "Language Technology Programme for Icelandic 2019 - 2023", managed and coordinated by Almannarómur and funded by the Icelandic Ministry of Education, Science and Culture.
- Special thanks to Björn Ingi Stefánsson for setting up the server configuration for model training.