wav2vec2-large-xls-r-300m-bg-d2 Open Source Speech Recognition Model

Home

Wav2vec2 Large Xls R 300m Bg D2

Developed by DrishtiSharma

An automatic speech recognition model fine-tuned on Bulgarian language datasets based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Bulgarian speech recognition #High-precision WER #Multi-scenario adaptation

Downloads 20

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model for Bulgarian, fine-tuned on the Common Voice 8.0 dataset, capable of converting Bulgarian audio into text.

Model Features

High-performance Bulgarian recognition

Achieves a WER of 28.78% and CER of 6.86% on the Common Voice 8.0 test set, demonstrating excellent performance

Based on large-scale pre-trained models

Fine-tuned from facebook's wav2vec2-xls-r-300m model, inheriting powerful speech feature extraction capabilities

Multi-dataset validation

Validated not only on the Common Voice dataset but also on robust speech event datasets

Model Capabilities

Bulgarian audio to text conversion

Long audio processing (supports chunk processing)

Speech recognition evaluation

Use Cases

Speech transcription

Bulgarian speech to text

Convert Bulgarian speech content into editable text

High accuracy on standard test sets

Voice assistants

Bulgarian voice command recognition

Basic recognition module for building Bulgarian voice assistants

🚀 wav2vec2-large-xls-r-300m-bg-d2

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BG dataset, aiming to provide high - quality automatic speech recognition for Bulgarian.

✨ Features

Multilingual Adaptability: Based on the pre - trained model wav2vec2-xls-r-300m, it has strong adaptability to different languages.
High - Precision Recognition: Achieved low Word Error Rate (WER) and Character Error Rate (CER) on the evaluation set, demonstrating excellent recognition performance.

📦 Installation

There is no specific installation content provided in the original document, so this section is skipped.

💻 Usage Examples

There is no code example provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Language	Bulgarian (bg)
License	Apache - 2.0
Tags	automatic - speech - recognition, bg, generated_from_trainer, hf - asr - leaderboard, mozilla - foundation/common_voice_8_0, robust - speech - event
Datasets	mozilla - foundation/common_voice_8_0

Model Evaluation Results

Common Voice 8 Dataset:
- Test WER: 0.28775471338792613
- Test CER: 0.06861971204625049
Robust Speech Event - Dev Data:
- Test WER: 0.49783147459727384
- Test CER: 0.1591062599627158
Robust Speech Event - Test Data:
- Test WER: 51.25

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-d2 --dataset mozilla-foundation/common_voice_8_0 --config bg --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-d2 --dataset speech-recognition-community-v2/dev_data --config bg --split validation --chunk_length_s 10 --stride_length_s 1

Training Hyperparameters

learning_rate: 0.00025
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 700
num_epochs: 35
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
6.8791	1.74	200	3.1902	1.0
3.0441	3.48	400	2.8098	0.9864
1.1499	5.22	600	0.4668	0.5014
0.4968	6.96	800	0.4162	0.4472
0.3553	8.7	1000	0.3580	0.3777
0.3027	10.43	1200	0.3422	0.3506
0.2562	12.17	1400	0.3556	0.3639
0.2272	13.91	1600	0.3621	0.3583
0.2125	15.65	1800	0.3436	0.3358
0.1904	17.39	2000	0.3650	0.3545
0.1695	19.13	2200	0.3366	0.3241
0.1532	20.87	2400	0.3550	0.3311
0.1453	22.61	2600	0.3582	0.3131
0.1359	24.35	2800	0.3524	0.3084
0.1233	26.09	3000	0.3503	0.2973
0.1114	27.83	3200	0.3434	0.2946
0.1051	29.57	3400	0.3474	0.2956
0.0965	31.3	3600	0.3426	0.2907
0.0923	33.04	3800	0.3478	0.2894
0.0894	34.78	4000	0.3421	0.2860

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

🔧 Technical Details

This model is a fine - tuned version of wav2vec2 - large - xls - r - 300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BG dataset. Through a series of training hyperparameter settings and optimization strategies, it has achieved good performance on the Bulgarian automatic speech recognition task.

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご