Wav2vec2 Xls R 300m Uk

W

Wav2vec2 Xls R 300m Uk

Developed by robinhad

This is an automatic speech recognition (ASR) model fine-tuned on Ukrainian language datasets based on the facebook/wav2vec2-xls-r-300m model, achieving a 12.22% word error rate (WER) on the Common Voice Ukrainian test set.

Speech Recognition

OtherOpen Source License:MIT #Ukrainian speech recognition #Low word error rate #Common Voice dataset

Downloads 72

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic speech recognition tasks in Ukrainian, capable of converting Ukrainian speech into text.

Model Features

Low word error rate

Achieves only a 12.22% word error rate (WER) on the Common Voice Ukrainian test set, demonstrating excellent performance.

Based on XLS-R architecture

Utilizes Facebook's wav2vec2-xls-r-300m architecture, featuring powerful speech feature extraction capabilities.

Optimized for Ukrainian

Specially fine-tuned and optimized for Ukrainian, making it suitable for Ukrainian speech recognition scenarios.

Model Capabilities

Ukrainian speech recognition

Speech-to-text

Use Cases

Speech transcription

Ukrainian speech to text

Convert Ukrainian speech content into editable text

Accuracy rate of 87.78% (WER=12.22%)

Voice assistants

Ukrainian voice assistant

Provide voice interaction functionality for Ukrainian users

🚀 wav2vec2-xls-r-300m-uk

This is a fine - tuned model for automatic speech recognition, achieving good results on the evaluation set.

🚀 Quick Start

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0927
Wer: 0.1222
Cer: 0.0204

✨ Features

Automatic Speech Recognition: Specialized for automatic speech recognition tasks.
Based on Common Voice: Trained with data from the Common Voice dataset.

📚 Documentation

Training and Evaluation Data

More information needed

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 40
eval_batch_size: 40
seed: 42
gradient_accumulation_steps: 6
total_train_batch_size: 240
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Cer	Validation Loss	Wer
9.0008	1.68	200	1.0	3.7590	1.0
3.4972	3.36	400	1.0	3.3933	1.0
3.3432	5.04	600	1.0	3.2617	1.0
3.2421	6.72	800	1.0	3.0712	1.0
1.9839	7.68	1000	0.1400	0.7204	0.6561
0.8017	9.36	1200	0.0766	0.3734	0.4159
0.5554	11.04	1400	0.0583	0.2621	0.3237
0.4309	12.68	1600	0.0486	0.2085	0.2753
0.3697	14.36	1800	0.0421	0.1746	0.2427
0.3293	16.04	2000	0.0388	0.1597	0.2243
0.2934	17.72	2200	0.0358	0.1428	0.2083
0.2704	19.4	2400	0.0333	0.1326	0.1949
0.2547	21.08	2600	0.0322	0.1255	0.1882
0.2366	22.76	2800	0.0309	0.1211	0.1815
0.2183	24.44	3000	0.0294	0.1159	0.1727
0.2115	26.13	3200	0.0280	0.1117	0.1661
0.1968	27.8	3400	0.0274	0.1063	0.1622
0.1922	29.48	3600	0.0269	0.1082	0.1598
0.1847	31.17	3800	0.0260	0.1061	0.1550
0.1715	32.84	4000	0.0252	0.1014	0.1496
0.1689	34.53	4200	0.0250	0.1012	0.1492
0.1655	36.21	4400	0.0243	0.0999	0.1450
0.1585	37.88	4600	0.0239	0.0967	0.1432
0.1492	39.57	4800	0.0237	0.0978	0.1421
0.1491	41.25	5000	0.0236	0.0963	0.1412
0.1453	42.93	5200	0.0230	0.0979	0.1373
0.1386	44.61	5400	0.0227	0.0959	0.1353
0.1387	46.29	5600	0.0226	0.0927	0.1355
0.1329	47.97	5800	0.0224	0.0951	0.1341
0.1295	49.65	6000	0.0219	0.0950	0.1306
0.1287	51.33	6200	0.0216	0.0937	0.1290
0.1277	53.02	6400	0.0215	0.0963	0.1294
0.1201	54.69	6600	0.0213	0.0959	0.1282
0.1199	56.38	6800	0.0215	0.0944	0.1286
0.1221	58.06	7000	0.0209	0.0938	0.1249
0.1145	59.68	7200	0.0208	0.0941	0.1254
0.1143	61.36	7400	0.0209	0.0941	0.1249
0.1143	63.04	7600	0.0209	0.0940	0.1248
0.1137	64.72	7800	0.0205	0.0931	0.1234
0.1125	66.4	8000	0.0204	0.0927	0.1222

Framework versions

Transformers 4.25.1
Pytorch 1.13.1+cu117
Datasets 2.8.0
Tokenizers 0.13.2

📄 License

This project is licensed under the MIT license.

🔧 Technical Details

Model Index

Name: wav2vec2-xls-r-300m-uk
Results:
- Task:
  - Name: Speech Recognition
  - Type: automatic-speech-recognition
- Dataset:
  - Name: Common Voice uk
  - Type: common_voice
  - Args: uk
- Metrics:
  - Name: Test WER
  - Type: wer
  - Value: 12.22

Tags

automatic-speech-recognition
common_voice
generated_from_trainer

Datasets

common_voice

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase