xls-r-300m-pt Open-source Automatic Speech Recognition Model - Empowering Efficient Transcription of Portuguese Speech Content

Xls R 300m Pt

Developed by AlexN

This is an automatic speech recognition model fine-tuned on the Portuguese Common Voice 8.0 dataset based on Facebook's wav2vec2-xls-r-300m model

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Portuguese speech recognition #Multi-scenario robustness #Low word error rate

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Portuguese speech recognition tasks, performing excellently on the Common Voice 8.0 dataset with a word error rate (WER) of 19.36%

Model Features

Multilingual Support

Based on XLS-R architecture, supporting cross-language transfer learning

Efficient Performance

Achieves a word error rate of 19.36% on Portuguese test sets

Robustness

Stable performance on robust speech event test data

Model Capabilities

Portuguese speech-to-text

Automatic speech recognition

Handling speech with different accents

Use Cases

Speech Transcription

Portuguese Speech Transcription

Convert Portuguese speech to text

WER of 19.36% on Common Voice 8.0 test set

Voice Assistants

Portuguese Voice Command Recognition

Front-end speech recognition for Portuguese voice assistants

🚀 XLS-R 300M PT Model

This is an XLS-R 300M model fine - tuned on the Portuguese dataset of Mozilla Foundation's Common Voice 8.0. It is designed for automatic speech recognition tasks and has achieved competitive results on multiple datasets.

🚀 Quick Start

This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - PT dataset. It achieves the following results on the evaluation set:

Loss: 0.2290
Wer: 0.2382

✨ Features

Multilingual Adaptability: Based on the XLS - R architecture, it can potentially adapt to multiple languages.
Fine - Tuned for Portuguese: Specifically optimized for Portuguese speech recognition using the Common Voice 8.0 dataset.
Competitive Metrics: Achieved good WER and CER scores on both Common Voice 8.0 and Robust Speech Event datasets.

📚 Documentation

Model Index

Model Name	Task	Dataset	Metrics
xls - r - 300m - pt	Speech Recognition (automatic - speech - recognition)	Common Voice 8.0 pt (mozilla - foundation/common_voice_8_0, args: pt)	Test WER: 19.361 Test CER: 5.533
xls - r - 300m - pt	Speech Recognition (automatic - speech - recognition)	Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: fr)	Validation WER: 47.812 Validation CER: 18.805
xls - r - 300m - pt	Automatic Speech Recognition (automatic - speech - recognition)	Common Voice 8.0 (mozilla - foundation/common_voice_8_0, args: pt)	Test WER: 19.36
xls - r - 300m - pt	Automatic Speech Recognition (automatic - speech - recognition)	Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: pt)	Test WER: 48.01
xls - r - 300m - pt	Automatic Speech Recognition (automatic - speech - recognition)	Robust Speech Event - Test Data (speech - recognition - community - v2/eval_data, args: pt)	Test WER: 49.21

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1500
num_epochs: 15.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
3.0952	0.64	500	3.0982	1.0
1.7975	1.29	1000	0.7887	0.5651
1.4138	1.93	1500	0.5238	0.4389
1.344	2.57	2000	0.4775	0.4318
1.2737	3.21	2500	0.4648	0.4075
1.2554	3.86	3000	0.4069	0.3678
1.1996	4.5	3500	0.3914	0.3668
1.1427	5.14	4000	0.3694	0.3572
1.1372	5.78	4500	0.3568	0.3501
1.0831	6.43	5000	0.3331	0.3253
1.1074	7.07	5500	0.3332	0.3352
1.0536	7.71	6000	0.3131	0.3152
1.0248	8.35	6500	0.3024	0.3023
1.0075	9.0	7000	0.2948	0.3028
0.979	9.64	7500	0.2796	0.2853
0.9594	10.28	8000	0.2719	0.2789
0.9172	10.93	8500	0.2620	0.2695
0.9047	11.57	9000	0.2537	0.2596
0.8777	12.21	9500	0.2438	0.2525
0.8629	12.85	10000	0.2409	0.2493
0.8575	13.5	10500	0.2366	0.2440
0.8361	14.14	11000	0.2317	0.2385
0.8126	14.78	11500	0.2290	0.2382

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご