wav2vec2-large-xlsr-53-euskera Open-Source Speech Recognition Model

Wav2vec2 Large Xlsr 53 Euskera

Developed by mrm8488

A speech recognition model fine-tuned on the Basque language (Euskera) using the Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition OtherOpen Source License:Apache-2.0 #Basque speech recognition #High precision WER 24.03 #XLSR-53 fine-tuning

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for the Basque language, capable of converting Basque speech into text.

Model Features

Basque language optimization

Specially fine-tuned for Basque, providing better speech recognition performance

Based on XLSR-53

Based on the facebook/wav2vec2-large-xlsr-53 model, with powerful speech feature extraction capabilities

No language model required

Can be used directly without additional language model support

Model Capabilities

Basque speech recognition

Speech-to-text

16kHz audio processing

Use Cases

Speech transcription

Basque speech transcription

Convert Basque speech content into text

Achieved a WER of 24.03% on the Common Voice test set

Voice assistants

Basque voice command recognition

Used to develop voice assistant applications supporting Basque

🚀 Wav2Vec2-Large-XLSR-53-euskera

This model is a fine - tuned version of facebook/wav2vec2-large-xlsr-53 in Euskera, leveraging the Common Voice dataset. Ensure your speech input is sampled at 16kHz when using this model.

🚀 Quick Start

This fine - tuned model is based on facebook/wav2vec2-large-xlsr-53 and trained on Euskera using the Common Voice dataset. When using this model, make sure your speech input is sampled at 16kHz.

✨ Features

Audio Processing: Specialized for Euskera speech recognition.
Fine - Tuned: Based on the powerful facebook/wav2vec2-large-xlsr-53 model.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

test_dataset = load_dataset("common_voice", "eu", split="test[:2%]")

processor = Wav2Vec2Processor.from_pretrained("mrm8488/wav2vec2-large-xlsr-53-euskera")
model = Wav2Vec2ForCTC.from_pretrained("mrm8488/wav2vec2-large-xlsr-53-euskera")

resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
  speech_array, sampling_rate = torchaudio.load(batch["path"])
  batch["speech"] = resampler(speech_array).squeeze().numpy()
  return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)

print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset["sentence"][:2])

Advanced Usage

import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

test_dataset = load_dataset("common_voice", "eu", split="test")
wer = load_metric("wer")

processor = Wav2Vec2Processor.from_pretrained("mrm8488/wav2vec2-large-xlsr-53-euskera")
model = Wav2Vec2ForCTC.from_pretrained("mrm8488/wav2vec2-large-xlsr-53-euskera")
model.to("cuda")

chars_to_ignore_regex = '[\\,\\?\\.\\!\\-\\;\\:\\\"\\“\\%\\‘\\”\\�]'
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
  batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
  speech_array, sampling_rate = torchaudio.load(batch["path"])
  batch["speech"] = resampler(speech_array).squeeze().numpy()
  return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
  inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

  with torch.no_grad():
    logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits

  pred_ids = torch.argmax(logits, dim=-1)
  batch["pred_strings"] = processor.batch_decode(pred_ids)
  return batch

result = test_dataset.map(evaluate, batched=True, batch_size=8)

print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

📚 Documentation

Evaluation

The model can be evaluated on the Euskera test data of Common Voice using the provided code.

Test Result: 24.03 %

Training

The Common Voice train and validation datasets were used for training. However, the script used for training is not specified in the original document.

📄 License

This project is licensed under the Apache - 2.0 license.

📦 Model Information

Property	Details
Model Type	Wav2Vec2-Large-XLSR-53-euskera
Training Data	Common Voice `train`, `validation` datasets
Tags	audio, automatic-speech-recognition, speech, xlsr-fine-tuning-week
License	apache-2.0
Model Name	XLSR Wav2Vec2 Euskera Manuel Romero
Task	Speech Recognition (automatic-speech-recognition)
Dataset	Common Voice eu (common_voice, args: eu)
Metrics (Test WER)	24.03

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご