Open-source wav2vec2-large-xls-r-300m-bg-v1 model - Free implementation of Bulgarian automatic speech recognition

Wav2vec2 Large Xls R 300m Bg V1

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on Bulgarian speech datasets based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Bulgarian speech recognition #Multi-scenario speech transcription #Low character error rate

Downloads 16

Release Time : 3/2/2022

Model Overview

This model is specifically optimized for Bulgarian language, designed for automatic speech recognition tasks that convert speech to text.

Model Features

Bulgarian Language Optimization

Specifically fine-tuned for Bulgarian language to provide better speech recognition performance

Based on Large-scale Pretrained Model

Fine-tuned based on Facebook's wav2vec2-xls-r-300m model, inheriting its powerful speech feature extraction capabilities

Multi-dataset Evaluation

Evaluated on multiple datasets including Common Voice 8 and Robust Speech Event

Model Capabilities

Bulgarian speech recognition

Continuous speech-to-text conversion

Conversational speech processing

Use Cases

Speech Transcription

Bulgarian Speech Transcription

Convert Bulgarian speech content into text

WER of 0.47 on Common Voice 8 test set

Voice Assistants

Bulgarian Voice Command Recognition

Speech recognition module for Bulgarian voice assistants or voice control systems

🚀 wav2vec2-large-xls-r-300m-bg-v1

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Bulgarian dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0, designed for automatic speech recognition.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - BG dataset. It achieves the following results on the evaluation set:

Loss: 0.5197
Wer: 0.4689

✨ Features

Multilingual Adaptability: Based on the large - scale pre - trained model wav2vec2 - xls - r - 300m, it has good adaptability to different languages.
High - precision Recognition: Demonstrates excellent performance in automatic speech recognition tasks on Bulgarian datasets.

📦 Installation

The README does not provide installation steps, so this section is skipped.

💻 Usage Examples

The README does not provide code examples, so this section is skipped.

📚 Documentation

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-v1 --dataset mozilla-foundation/common_voice_8_0 --config bg --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-bg-v1 --dataset speech-recognition-community-v2/dev_data --config bg --split validation --chunk_length_s 10 --stride_length_s 1

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e - 05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.3711	2.61	300	4.3122	1.0
3.1653	5.22	600	3.1156	1.0
2.8904	7.83	900	2.8421	0.9918
0.9207	10.43	1200	0.9895	0.8689
0.6384	13.04	1500	0.6994	0.7700
0.5215	15.65	1800	0.5628	0.6443
0.4573	18.26	2100	0.5316	0.6174
0.3875	20.87	2400	0.4932	0.5779
0.3562	23.48	2700	0.4972	0.5475
0.3218	26.09	3000	0.4895	0.5219
0.2954	28.7	3300	0.5226	0.5192
0.287	31.3	3600	0.4957	0.5146
0.2587	33.91	3900	0.4944	0.4893
0.2496	36.52	4200	0.4976	0.4895
0.2365	39.13	4500	0.5185	0.4819
0.2264	41.74	4800	0.5152	0.4776
0.2224	44.35	5100	0.5031	0.4746
0.2096	46.96	5400	0.5062	0.4708
0.2038	49.57	5700	0.5217	0.4698

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

🔧 Technical Details

The model wav2vec2-large-xls-r-300m-bg-v1 is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Bulgarian dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0. Through fine - tuning, it can better adapt to the characteristics of Bulgarian speech, thereby improving the accuracy of speech recognition.

📄 License

This model is licensed under the Apache 2.0 license.

Information Table

Property	Details
Model Type	wav2vec2 - large - xls - r - 300m - bg - v1
Training Data	mozilla - foundation/common_voice_8_0
Task	Automatic Speech Recognition
Test WER on Common Voice 8	0.4709579127785184
Test CER on Common Voice 8	0.10205125354383235
Test WER on Robust Speech Event - Dev Data	0.7053128872366791
Test CER on Robust Speech Event - Dev Data	0.210804311998487
Test WER on Robust Speech Event - Test Data	72.6

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご