Greek_lsr_1 Open-Source Automatic Speech Recognition Model - Free to Use and Accurately Recognize Greek Speech

Home

Greek Lsr 1

Developed by skylord

An automatic speech recognition model fine-tuned on Greek language based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Greek speech recognition #XLSR fine-tuning #Low-resource optimization

Downloads 17

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model optimized for Greek, based on Facebook's XLSR-53 architecture and fine-tuned on the Common Voice Greek dataset.

Model Features

Greek language optimization

Specifically fine-tuned for Greek speech recognition tasks

Large model architecture

Based on the XLSR-53 large model architecture with powerful speech feature extraction capabilities

No language model required

Can be used directly without additional language model support

Model Capabilities

Greek speech recognition

16kHz audio processing

Use Cases

Speech-to-text

Greek speech transcription

Convert Greek speech content into text

Achieves a WER of 56.25% on the Common Voice test set

🚀 Wav2Vec2-Large-XLSR-53-Greek

This model is a fine - tuned version of facebook/wav2vec2-large-xlsr-53 on Greek, aiming to provide high - quality automatic speech recognition for the Greek language.

🚀 Quick Start

Fine - tuned facebook/wav2vec2-large-xlsr-53 on Greek using the Common Voice, ... and ... dataset{s}. #TODO: replace {language} with your language, e.g. French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice When using this model, make sure that your speech input is sampled at 16kHz.

✨ Features

Multilingual Adaptability: Based on the XLSR - 53 architecture, it has potential for multilingual applications.
High - Quality Speech Recognition: Fine - tuned on Greek datasets to provide accurate speech - to - text conversion.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
test_dataset = load_dataset("common_voice", "el", split="test[:2%]")

processor = Wav2Vec2Processor.from_pretrained("skylord/greek_lsr_1") 
model = Wav2Vec2ForCTC.from_pretrained("skylord/greek_lsr_1") 

resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
  speech_array, sampling_rate = torchaudio.load(batch["path"])
  batch["speech"] = resampler(speech_array).squeeze().numpy()
  return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
  
predicted_ids = torch.argmax(logits, dim=-1)
print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset["sentence"][:2])

Advanced Usage

import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

test_dataset = load_dataset("common_voice", "el", split="test") 
wer = load_metric("wer")

processor = Wav2Vec2Processor.from_pretrained("skylord/greek_lsr_1") 
model = Wav2Vec2ForCTC.from_pretrained("skylord/greek_lsr_1")
model.to("cuda")

chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“]' 
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays

def speech_file_to_array_fn(batch):
  batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
  speech_array, sampling_rate = torchaudio.load(batch["path"])
  batch["speech"] = resampler(speech_array).squeeze().numpy()
  return batch
  
test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays

def evaluate(batch):
  inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
  with torch.no_grad():
    logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
  pred_ids = torch.argmax(logits, dim=-1)
  batch["pred_strings"] = processor.batch_decode(pred_ids)
  return batch

result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Test Result: 56.253154 %

📚 Documentation

Evaluation

The model can be evaluated as follows on the Greek test data of Common Voice. The above advanced usage example shows the evaluation process.

Training

The Common Voice train, validation, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training. The script used for training can be found here # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.

📄 License

This model is licensed under the apache - 2.0 license.

📦 Model Information

Property	Details
Model Type	Wav2Vec2 - Large - XLSR - 53 - Greek
Training Data	Common Voice `train`, `validation`, and ... datasets # TODO: adapt to state all the datasets that were used for training
Metrics	WER (Word Error Rate)
License	apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご