wav2vec2-xls-r-1b-ca-lm Open-source Speech Recognition Model

Wav2vec2 Xls R 1b Ca Lm

Developed by PereLluis13

This is a Catalan speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m, trained on multiple Catalan datasets.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Catalan speech recognition #Multi-dataset fine-tuning #Low CER performance

Downloads 3,758

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model for Catalan, fine-tuned on the Common Voice 8.0, tv3_parla, and parlament_parla datasets.

Model Features

Multi-dataset training

Trained on three Catalan datasets—Common Voice 8.0, tv3_parla, and parlament_parla—enhancing model robustness.

Data preprocessing optimization

Removed characters not present in the Catalan alphabet and converted numbers to their textual form, improving recognition accuracy.

High-performance results

Achieved excellent performance on multiple test sets, such as a WER of only 6.07% on the Common Voice 8.0 test set.

Model Capabilities

Catalan speech recognition

High-accuracy transcription

Multi-domain speech processing

Use Cases

Media transcription

TV program subtitle generation

Automatically generate subtitles for Catalan TV programs

Achieved a WER of 11.21% on the tv3_parla test set

Meeting transcription

Parliament meeting transcription

Automatically transcribe Catalan parliamentary meetings

Achieved a WER of 5.14% on the parlament_parla test set

Voice assistants

Catalan voice input

Provide speech recognition capabilities for Catalan voice assistants

Achieved a WER of 6.07% on the Common Voice test set

🚀 wav2vec2-xls-r-1b-ca-lm

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - CA, the tv3_parla and parlament_parla datasets. It is designed for automatic speech recognition, aiming to provide high - quality speech - to - text conversion in Catalan.

🚀 Quick Start

This model can be used for speech recognition tasks. You can refer to the Hugging Face Transformers library for more information on how to load and use this model.

✨ Features

Fine - tuned on Multiple Datasets: The model is fine - tuned on MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - CA, tv3_parla, and parlament_parla datasets, which helps it adapt well to different speech scenarios in Catalan.
Low Error Rates: As shown in the evaluation results, it has relatively low Word Error Rate (WER) and Character Error Rate (CER) on multiple datasets, indicating high recognition accuracy.

📚 Documentation

Model description

Please check the original facebook/wav2vec2-xls-r-1b Model card. This is just a finetuned version of that model.

Intended uses & limitations

As any model trained on crowdsourced data, this model can show the biases and particularities of the data and model used to train this model. Moreover, since this is a speech recognition model, it may underperform for some lower - resourced dialects for the Catalan language.

Training and evaluation data

The data is preprocessed to remove characters not on the Catalan alphabet. Moreover, numbers are verbalized using code provided by @ccoreilly, which can be found on the text/ folder or [here](https://github.com/CollectivaT - dev/catotron - cpu/blob/master/text/numbers_ca.py).

Training results

Check the Tensorboard tab to check the training profile and evaluation results along training. The model was evaluated on the test splits for each of the datasets used during training.

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	2e - 05
train_batch_size	8
eval_batch_size	8
seed	42
gradient_accumulation_steps	8
total_train_batch_size	64
optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_steps	2000
num_epochs	10.0
mixed_precision_training	Native AMP

Framework versions

Property	Details
Transformers	4.17.0.dev0
Pytorch	1.10.2+cu102
Datasets	1.18.3
Tokenizers	0.11.0

Evaluation Results

Dataset	Task	Test WER	Test CER
mozilla - foundation/common_voice_8_0 ca	Speech Recognition	6.0722669958130644	1.9180697705166526
projecte - aina/parlament_parla ca	Speech Recognition	5.139820371024042	2.0163620128164722
collectivat/tv3_parla ca	Speech Recognition	11.207991684952073	7.32119307305963
Robust Speech Event - Catalan Dev Data	Speech Recognition	22.870153690468661	13.59039190897598
Robust Speech Event - Test Data	Automatic Speech Recognition	15.41	N/A

📄 License

This model is licensed under the Apache - 2.0 license.

Thanks

Want to thank both @ccoreilly and @gullabi who have contributed with their own resources and knowledge into making this model possible.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご