Open-source ASR model wav2vec2-xls-r-300m-kk-n2: Facilitating accurate Kazakh speech recognition

Wav2vec2 Xls R 300m Kk N2

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on Kazakh (KK) speech datasets based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Kazakh speech recognition #Multi-dialect robustness #Low CER performance

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Kazakh speech recognition tasks, fine-tuned on the Common Voice 8 dataset, capable of converting Kazakh speech into text.

Model Features

Kazakh Language Optimization

Specially fine-tuned and optimized for Kazakh speech recognition

Based on Large-scale Pre-trained Model

Fine-tuned based on Facebook's wav2vec2-xls-r-300m model, inheriting its powerful speech feature extraction capabilities

Medium-sized Model

The 300M parameter size achieves a good balance between accuracy and computational efficiency

Model Capabilities

Kazakh speech recognition

Speech-to-text

Automatic speech recognition

Use Cases

Speech Transcription

Kazakh Speech Transcription

Convert Kazakh speech content into text format

WER of 0.4355 on Common Voice 8 test set

Voice Assistants

Kazakh Voice Command Recognition

Used for voice command recognition in Kazakh voice assistants

🚀 wav2vec2-xls-r-300m-kk-n2

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It is designed for automatic speech recognition tasks, providing a reliable solution for transcribing Kazakh speech.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It achieves the following results on the evaluation set:

Loss: 0.7149
Wer: 0.451

✨ Features

Fine - Tuned: Based on the pre - trained facebook/wav2vec2-xls-r-300m, fine - tuned on the Kazakh dataset for better performance.
Multiple Evaluation Metrics: Evaluated using Loss, WER (Word Error Rate), and CER (Character Error Rate).

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-300m-kk-n2 --dataset mozilla-foundation/common_voice_8_0 --config kk --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

Kazakh language not found in speech-recognition-community-v2/dev_data!

📚 Documentation

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000222
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 150.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
9.6799	9.09	200	3.6119	1.0
3.1332	18.18	400	2.5352	1.005
1.0465	27.27	600	0.6169	0.682
0.3452	36.36	800	0.6572	0.607
0.2575	45.44	1000	0.6527	0.578
0.2088	54.53	1200	0.6828	0.551
0.158	63.62	1400	0.7074	0.5575
0.1309	72.71	1600	0.6523	0.5595
0.1074	81.8	1800	0.7262	0.5415
0.087	90.89	2000	0.7199	0.521
0.0711	99.98	2200	0.7113	0.523
0.0601	109.09	2400	0.6863	0.496
0.0451	118.18	2600	0.6998	0.483
0.0378	127.27	2800	0.6971	0.4615
0.0319	136.36	3000	0.7119	0.4475
0.0305	145.44	3200	0.7181	0.459

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

🔧 Technical Details

The model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset. It uses specific hyperparameters during training to optimize its performance on Kazakh speech recognition tasks.

📄 License

The model is released under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned wav2vec2 - xls - r - 300m for Kazakh speech recognition
Training Data	MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - KK dataset

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご