wav2vec2-large-xls-r-300m-kk-with-LM Open Source Model - Kazakh Speech Recognition with Language Enhancement Support

Wav2vec2 Large Xls R 300m Kk With LM

Developed by DrishtiSharma

This model is an automatic speech recognition (ASR) model fine-tuned on the Kazakh (KK) dataset based on facebook/wav2vec2-xls-r-300m, with language model (LM) enhancement support

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Kazakh speech recognition #Low word error rate #Multi-scenario adaptation

Downloads 22

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition model for Kazakh, trained on the Common Voice 8.0 dataset, capable of converting Kazakh speech to text

Model Features

Language model enhancement

The model incorporates a language model (LM) for post-processing, improving recognition accuracy

Multi-dataset evaluation

Evaluated on multiple datasets including Common Voice and Robust Speech Event

Large-scale pre-training

Fine-tuned based on the 300M-parameter wav2vec2-XLS-R model with powerful speech feature extraction capabilities

Model Capabilities

Kazakh speech recognition

Speech-to-text

Supports language model post-processing

Use Cases

Speech transcription

Kazakh speech transcription

Convert Kazakh speech content to text

Achieved 41.7% WER on Common Voice 8.0 test set

Voice assistants

Kazakh voice command recognition

Used for voice command recognition in Kazakh voice assistants or control systems

🚀 wav2vec2-large-xls-r-300m-kk-with-LM

This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m for Automatic Speech Recognition in the Kazakh (kk) language. It offers high - quality speech recognition performance on relevant datasets.

🚀 Quick Start

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py  --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-kk-with-LM  --dataset mozilla-foundation/common_voice_8_0 --config kk --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data Kazakh language isn't available in speech - recognition - community - v2/dev_data

✨ Features

Multilingual Adaptability: Built on a large - scale pre - trained model, it can adapt well to the Kazakh language.
High - Performance Metrics: Achieves good results in WER (Word Error Rate) and CER (Character Error Rate) on the evaluation datasets.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2 - large - xls - r - 300m - kk - with - LM
Training Data	mozilla - foundation/common_voice_8_0

Evaluation Results

This model has the following performance on different datasets:

Common Voice 8 (ru):
- Test WER: 0.4355
- Test CER: 0.10469915859660263
- Test WER (+LM): 0.417
- Test CER (+LM): 0.10319098269566598
Robust Speech Event - Dev Data (kk):
- Test WER: NA
- Test CER: NA
Common Voice 8.0 (kk):
- Test WER: 41.7
Robust Speech Event - Test Data (kk):
- Test WER: 67.09

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000222
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 150.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
9.6799	9.09	200	3.6119	1.0
3.1332	18.18	400	2.5352	1.005
1.0465	27.27	600	0.6169	0.682
0.3452	36.36	800	0.6572	0.607
0.2575	45.44	1000	0.6527	0.578
0.2088	54.53	1200	0.6828	0.551
0.158	63.62	1400	0.7074	0.5575
0.1309	72.71	1600	0.6523	0.5595
0.1074	81.8	1800	0.7262	0.5415
0.087	90.89	2000	0.7199	0.521
0.0711	99.98	2200	0.7113	0.523
0.0601	109.09	2400	0.6863	0.496
0.0451	118.18	2600	0.6998	0.483
0.0378	127.27	2800	0.6971	0.4615
0.0319	136.36	3000	0.7119	0.4475
0.0305	145.44	3200	0.7181	0.459

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

🔧 Technical Details

No in - depth technical details are provided in the original document, so this section is skipped.

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご