Wav2Vec2-large-xls-r-300m-br-d10 Open-Source Speech Recognition Model

Wav2vec2 Large Xls R 300m Br D10

Developed by DrishtiSharma

This is a speech recognition model fine-tuned on Breton language dataset based on facebook/wav2vec2-xls-r-300m, achieving a 52.3% Word Error Rate (WER) on the Common Voice 8 test set.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Breton speech recognition #Low-resource language processing #WER optimization

Downloads 21

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Breton language, based on the wav2vec2 architecture, suitable for converting Breton speech audio to text.

Model Features

Breton language optimization

Specifically fine-tuned for Breton language, demonstrating good performance on speech recognition tasks for this language

Based on wav2vec2-xls-r architecture

Utilizes the powerful wav2vec2-xls-r-300m as the base model, featuring excellent speech feature extraction capabilities

Relatively lightweight

With 300 million parameters, it maintains good performance while being relatively lightweight

Model Capabilities

Breton speech recognition

Audio to text conversion

Speech transcription

Use Cases

Speech transcription

Breton speech transcription

Convert Breton speech content into text

Achieved 52.3% WER on Common Voice 8 test set

Voice assistants

Breton voice command recognition

Used to support Breton-language voice assistants and smart devices

🚀 wav2vec2-large-xls-r-300m-br-d10

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BR dataset. It is designed for automatic speech recognition tasks, offering high - quality speech recognition capabilities on relevant datasets.

✨ Features

Fine - tuned Model: Based on the pre - trained facebook/wav2vec2-xls-r-300m, it is fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BR dataset.
Evaluation Metrics: Achieves specific WER and CER values on different datasets, indicating its performance in speech recognition.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Model Information

Property	Details
Language	Brazilian Portuguese (br)
License	Apache - 2.0
Tags	generated_from_trainer, robust - speech - event, hf - asr - leaderboard
Datasets	mozilla - foundation/common_voice_8_0
Metrics	wer

Model Index

Name: wav2vec2 - large - xls - r - 300m - br - d10
Results:
- Task 1:
  - Task Type: automatic - speech - recognition
  - Dataset: mozilla - foundation/common_voice_8_0 (Common Voice 8, br)
  - Metrics:
    - Test WER: 0.5230357484228637
    - Test CER: 0.1880661144228536
- Task 2:
  - Task Type: automatic - speech - recognition
  - Dataset: speech - recognition - community - v2/dev_data (Robust Speech Event - Dev Data, br)
  - Metrics:
    - Test WER: NA
    - Test CER: NA

Evaluation Results

It achieves the following results on the evaluation set:

Loss: 1.1382
Wer: 0.4895

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-br-d10 --dataset mozilla-foundation/common_voice_8_0 --config br --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Breton language isn't available in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 800
num_epochs: 50
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
13.611	0.68	100	5.8492	1.0
3.8176	1.35	200	3.2181	1.0
3.0457	2.03	300	3.0902	1.0
2.2632	2.7	400	1.4882	0.9426
1.1965	3.38	500	1.1396	0.7950
0.984	4.05	600	1.0216	0.7583
0.8036	4.73	700	1.0258	0.7202
0.7061	5.41	800	0.9710	0.6820
0.689	6.08	900	0.9731	0.6488
0.6063	6.76	1000	0.9442	0.6569
0.5215	7.43	1100	1.0221	0.6671
0.4965	8.11	1200	0.9266	0.6181
0.4321	8.78	1300	0.9050	0.5991
0.3762	9.46	1400	0.9801	0.6134
0.3747	10.14	1500	0.9210	0.5747
0.3554	10.81	1600	0.9720	0.6051
0.3148	11.49	1700	0.9672	0.6099
0.3176	12.16	1800	1.0120	0.5966
0.2915	12.84	1900	0.9490	0.5653
0.2696	13.51	2000	0.9394	0.5819
0.2569	14.19	2100	1.0197	0.5667
0.2395	14.86	2200	0.9771	0.5608
0.2367	15.54	2300	1.0516	0.5678
0.2153	16.22	2400	1.0097	0.5679
0.2092	16.89	2500	1.0143	0.5430
0.2046	17.57	2600	1.0884	0.5631
0.1937	18.24	2700	1.0113	0.5648
0.1752	18.92	2800	1.0056	0.5470
0.164	19.59	2900	1.0340	0.5508
0.1723	20.27	3000	1.0743	0.5615
0.1535	20.95	3100	1.0495	0.5465
0.1432	21.62	3200	1.0390	0.5333
0.1561	22.3	3300	1.0798	0.5590
0.1384	22.97	3400	1.1716	0.5449
0.1359	23.65	3500	1.1154	0.5420
0.1356	24.32	3600	1.0883	0.5387
0.1355	25.0	3700	1.1114	0.5504
0.1158	25.68	3800	1.1171	0.5388
0.1166	26.35	3900	1.1335	0.5403
0.1165	27.03	4000	1.1374	0.5248
0.1064	27.7	4100	1.0336	0.5298
0.0987	28.38	4200	1.0407	0.5216
0.104	29.05	4300	1.1012	0.5350
0.0894	29.73	4400	1.1016	0.5310
0.0912	30.41	4500	1.1383	0.5302
0.0972	31.08	4600	1.0851	0.5214
0.0832	31.76	4700	1.1705	0.5311
0.0859	32.43	4800	1.0750	0.5192
0.0811	33.11	4900	1.0900	0.5180
0.0825	33.78	5000	1.1271	0.5196
0.07	34.46	5100	1.1289	0.5141
0.0689	35.14	5200	1.0960	0.5101
0.068	35.81	5300	1.1377	0.5050
0.0776	36.49	5400	1.0880	0.5194
0.0642	37.16	5500	1.1027	0.5076
0.0607	37.84	5600	1.1293	0.5119
0.0607	38.51	5700	1.1229	0.5103
0.0545	39.19	5800	1.1168	0.5103
0.0562	39.86	5900	1.1206	0.5073
0.0484	40.54	6000	1.1710	0.5019
0.0499	41.22	6100	1.1511	0.5100
0.0455	41.89	6200	1.1488	0.5009
0.0475	42.57	6300	1.1196	0.4944
0.0413	43.24	6400	1.1654	0.4996
0.0389	43.92	6500	1.0961	0.4930
0.0428	44.59	6600	1.0955	0.4938
0.039	45.27	6700	1.1323	0.4955
0.0352	45.95	6800	1.1040	0.4930
0.0334	46.62	6900	1.1382	0.4942
0.0338	47.3	7000	1.1264	0.4911
0.0307	47.97	7100	1.1216	0.4881
0.0286	48.65	7200	1.1459	0.4894
0.0348	49.32	7300	1.1419	0.4906
0.0329	50.0	7400	1.1382	0.4895

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご