Galician_xlsr Open-Source Automatic Speech Recognition Model - Accurately Recognize Galician Speech Content

Galician Xlsr

Developed by Akashpb13

This model is an automatic speech recognition model fine-tuned on the Galician dataset based on facebook/wav2vec2-xls-r-300m, achieving a WER of 11.31% on the Common Voice 8.0 test set.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Galician speech recognition #Low WER #Multi-dialect support

Downloads 110

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition model for Galician, fine-tuned based on the XLS-R architecture, suitable for speech-to-text tasks.

Model Features

Multi-dataset training

The model was trained using train.tsv, dev.tsv, invalidated.tsv, reported.tsv, and other.tsv from the Common Voice Galician dataset

High-quality data filtering

Only data points with more supporting votes than opposing votes were used, and duplicates were removed after merging the datasets

Optimized training process

Used cosine_with_restarts learning rate scheduler and trained for 100 epochs

Model Capabilities

Galician speech recognition

Speech-to-text

Supports multiple audio input formats

Use Cases

Speech transcription

Galician speech transcription

Convert Galician speech to text

Achieved a WER of 11.31% on the Common Voice 8.0 test set

Voice assistants

Galician voice assistant

For Galician voice interaction systems

🚀 Akashpb13/Galician_xlsr

This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - hu dataset. It offers high - performance in automatic speech recognition tasks, especially for Galician language processing.

✨ Features

Automatic Speech Recognition: Capable of accurately transcribing speech, as demonstrated by its performance on multiple datasets.
Trained on Specific Datasets: Utilizes the mozilla - foundation/common_voice_8_0 dataset for training, ensuring robustness in real - world scenarios.

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

📚 Documentation

Model description

The base model "facebook/wav2vec2 - xls - r - 300m" was finetuned to adapt to the Galician language and specific speech recognition tasks.

Intended uses & limitations

More information needed.

Training and evaluation data

The training data includes Common voice Galician train.tsv, dev.tsv, invalidated.tsv, reported.tsv, and other.tsv. Only those points were considered where upvotes were greater than downvotes, and duplicates were removed after concatenating all the datasets from common voice 7.0.

Training procedure

For creating the training dataset, all possible datasets were appended, and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000096
train_batch_size: 16
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 2
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 500
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
500	5.038100	3.035432	1.000000
1000	2.180000	0.406300	0.557964
1500	0.331700	0.153797	0.262394
2000	0.171600	0.145268	0.235627
2500	0.125900	0.136622	0.228087
3000	0.105400	0.131650	0.224128
3500	0.087600	0.141032	0.217531
4000	0.078300	0.143675	0.214515
4500	0.070000	0.144607	0.208106
5000	0.061500	0.135259	0.202828
5500	0.055600	0.130638	0.203959
6000	0.050500	0.137416	0.202451
6500	0.046600	0.140379	0.200000
7000	0.040800	0.140179	0.200377
7500	0.041000	0.138089	0.196795
8000	0.038400	0.136927	0.197172

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.18.3
Tokenizers 0.10.3

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id Akashpb13/Galician_xlsr --dataset mozilla - foundation/common_voice_8_0 --config gl --split test

🔧 Technical Details

The model was fine - tuned on a specific dataset, using a set of well - defined hyperparameters. The training process involved appending multiple datasets and performing a 90 - 10 split. The learning rate, batch size, and other hyperparameters were carefully selected to optimize the model's performance.

📄 License

The model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Automatic Speech Recognition
Training Data	mozilla - foundation/common_voice_8_0
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご