wav2vec2-large-xlsr-53-icelandic-ep30-967h Open-source Acoustic Model - Precise Automatic Speech Recognition for Icelandic

Wav2vec2 Large Xlsr 53 Icelandic Ep30 967h

Developed by language-and-voice-lab

An acoustic model fine-tuned specifically for Icelandic automatic speech recognition tasks, trained on 967 hours of Icelandic data

Speech Recognition

Transformers

Other#Icelandic speech recognition #High-precision WER #Large-scale fine-tuning

Downloads 2,153

Release Time : 7/30/2023

Model Overview

This model is the result of fine-tuning facebook/wav2vec2-large-xlsr-53 for 30 epochs, specifically designed for Icelandic automatic speech recognition tasks.

Model Features

Specifically for Icelandic

Optimized specifically for Icelandic, fine-tuned using 967 hours of Icelandic data

High-quality training data

Uses the Samrómur Milljón corpus, containing 1 million automatically verified recordings

Excellent performance

Shows excellent performance in the WER metric on multiple test sets, with a minimum of 4.234

Model Capabilities

Icelandic speech recognition

Speech-to-text

Automatic speech transcription

Use Cases

Speech transcription

Children's speech recognition

Recognize the content of children's speech

The WER is 6.467 on the Samrómur Children test set

Parliamentary speech transcription

Transcribe the content of Icelandic parliamentary speeches

The WER is 17.904 on the Althingi test set

🚀 wav2vec2-large-xlsr-53-icelandic-ep30-967h

The "wav2vec2-large-xlsr-53-icelandic-ep30-967h" is an acoustic model designed for Automatic Speech Recognition in Icelandic. It's derived from fine - tuning the facebook/wav2vec2-large-xlsr-53 model over 30 epochs, using 967 hours of Icelandic data. This data was collected by the Language and Voice Laboratory via the Samrómur platform.

🚀 Quick Start

The "wav2vec2-large-xlsr-53-icelandic-ep30-967h" can be readily used for Icelandic Automatic Speech Recognition tasks. It's built upon fine - tuning a pre - trained model with a large amount of Icelandic data, ensuring high - quality performance.

✨ Features

Icelandic - Specific: Tailored for Automatic Speech Recognition in Icelandic.
Fine - Tuned: Based on facebook/wav2vec2-large-xlsr-53, fine - tuned with 967 hours of Icelandic data.
Diverse Datasets: Trained on the Samrómur Milljón corpus.

📦 Installation

No explicit installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import Wav2Vec2Processor
from transformers import Wav2Vec2ForCTC

#Load the processor and model.
MODEL_NAME="language-and-voice-lab/wav2vec2-large-xlsr-53-icelandic-ep30-967h"
processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)

#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("language-and-voice-lab/samromur_children", split="test")

#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

#Process the dataset
def prepare_dataset(batch):
    audio = batch["audio"]
    #Batched output is "un-batched" to ensure mapping is correct
    batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
    with processor.as_target_processor():
        batch["labels"] = processor(batch["normalized_text"]).input_ids
    return batch
ds = ds.map(prepare_dataset, remove_columns=ds.column_names,num_proc=1)

#Define the evaluation metric
import numpy as np
wer_metric = load_metric("wer")
def compute_metrics(pred):
    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)
    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
    pred_str = processor.batch_decode(pred_ids)
    #We do not want to group tokens when computing the metrics
    label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
    wer = wer_metric.compute(predictions=pred_str, references=label_str)
    return {"wer": wer}

#Do the evaluation (with batch_size=1)
model = model.to(torch.device("cuda"))
def map_to_result(batch):
    with torch.no_grad():
        input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
        logits = model(input_values).logits
    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_str"] = processor.batch_decode(pred_ids)[0]
    batch["sentence"] = processor.decode(batch["labels"], group_tokens=False)
    return batch
results = ds.map(map_to_result,remove_columns=ds.column_names)

#Compute the overall WER now.
print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["sentence"])))

Test Result: 0.076

📚 Documentation

Model Details

The model "wav2vec2-large-xlsr-53-icelandic-ep30-967h" is fine - tuned from facebook/wav2vec2-large-xlsr-53. The fine - tuning was carried out for 30 epochs using 967 hours of Icelandic data from the Samrómur Milljón corpus.

Evaluation Results

Dataset	Split	WER
Samrómur	Test	7.698
Samrómur	Dev	6.786
Samrómur Children	Test	6.467
Samrómur Children	Dev	4.234
Malrómur	Test	6.631
Malrómur	Dev	5.836
Althingi	Test	17.904
Althingi	Dev	17.931

📄 License

This model is released under the CC - BY - 4.0 license.

📖 BibTeX entry and citation info

When publishing results based on these models please refer to:

@inproceedings{mena2024samromur,
  title={Samr{\'o}mur Millj{\'o}n: An ASR Corpus of One Million Verified Read Prompts in Icelandic},
  author={Mena, Carlos Daniel Hernandez and Gunnarsson, {\TH}orsteinn Da{\dh}i and Gu{\dh}nason, J{\'o}n},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  pages={14305--14312},
  year={2024}
}

🙏 Acknowledgements

Thanks to Jón Guðnason, head of the Language and Voice Lab, for providing computational power.
Thanks to the "Language Technology Programme for Icelandic 2019 - 2023", managed and coordinated by Almannarómur and funded by the Icelandic Ministry of Education, Science and Culture.
Special thanks to Björn Ingi Stefánsson for setting up the server configuration for model training.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご