wav2vec2-large-xls-r-1b-cv8-mt Open-source Model - Free Automatic Speech Recognition for Maltese

Wav2vec2 Large Xls R 1b Cv8 Mt

Developed by RuudVelo

An automatic speech recognition model fine-tuned on the Common Voice 8 Maltese dataset based on facebook/wav2vec2-xls-r-1b

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Maltese speech recognition #Low word error rate #Multilingual support

Downloads 17

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model optimized for Maltese, based on Facebook's large-scale wav2vec2 XLS-R architecture and fine-tuned on the Common Voice 8 dataset.

Model Features

High-performance Maltese recognition

Achieves a word error rate (WER) of 17.57% and a character error rate (CER) of 3.86% on the Common Voice 8 Maltese test set

Based on large-scale pretrained model

Fine-tuned from the facebook/wav2vec2-xls-r-1b model, inheriting its powerful speech feature extraction capabilities

Robust speech processing

The model is optimized for various speech environments and can handle multiple speech scenarios

Model Capabilities

Maltese speech recognition

Speech-to-text

Robust speech event detection

Use Cases

Speech transcription

Maltese speech transcription

Convert Maltese speech content into text

Word error rate 17.57%, character error rate 3.86%

Voice assistants

Maltese voice interaction

Provide speech recognition capabilities for Maltese voice assistants

🚀 wav2vec2-large-xls-r-1b-cv8-mt

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the common_voice dataset. It effectively addresses the problem of automatic speech recognition in the Maltese language, offering high - quality speech - to - text conversion services.

🚀 Quick Start

This section provides a basic introduction to the model. The model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the common_voice dataset.

✨ Features

Fine - tuned from facebook/wav2vec2-xls-r-1b on the common_voice dataset.
Achieves specific results on the evaluation set, including a Loss of 0.2210 and a Wer of 0.1974.
Another version with a KenLM 3 - gram model is available, which performs better.

📚 Documentation

Model description

Note: another version of this model is available with a KenLM 3 - gram model. This model performs better than this model. See https://huggingface.co/RuudVelo/wav2vec2-large-xls-r-1b-cv8-mt-lm

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following config and hyperparameters were used during training:

model = Wav2Vec2ForCTC.from_pretrained(
    "facebook/wav2vec2-xls-r-1b", 
    attention_dropout=0.05,
    hidden_dropout=0.05,
    feat_proj_dropout=0.05,
    mask_time_prob=0.55,
    mask_feature_prob=0.10,
    layerdrop=0.05,
    ctc_zero_infinity=True,
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id,
    vocab_size=len(processor.tokenizer),
)

from transformers import TrainingArguments

training_args = TrainingArguments(
  output_dir=repo_name,
  group_by_length=True,
  per_device_train_batch_size=32,
  gradient_accumulation_steps=2,
  evaluation_strategy="steps",
  num_train_epochs=50,
  gradient_checkpointing=True,
  fp16=True,
  save_steps=400,
  eval_steps=400,
  logging_steps=400,
  learning_rate=5.5e-05, 
  warmup_steps=500,
  save_total_limit=2,
  push_to_hub=True, 
  report_to="tensorboard")

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.4564	13.33	400	0.3783	0.3981
0.7931	26.66	800	0.2377	0.2298
0.5364	39.98	1200	0.2210	0.1974

Note that the test WER of 19.74 is different than the above reported 17.57. This was due to a bug which was found while processing files with an older version of the datasets library. The right library is listed below.

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

The model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned model on common_voice dataset based on facebook/wav2vec2-xls-r-1b
Training Data	mozilla - foundation/common_voice_8_0
License	Apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, mt, robust - speech - event, model_for_talk, hf - asr - leaderboard

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご