Central Kurdish XLSR Open-Source Automatic Speech Recognition Model - Supports Recognition of Central Kurdish Dialects

Central Kurdish Xlsr

Developed by Akashpb13

This is an automatic speech recognition model fine-tuned on the Central Kurdish dialect based on the facebook/wav2vec2-xls-r-300m model, trained on the Common Voice 8.0 dataset.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Kurdish speech recognition #XLS-R fine-tuning #Low CER performance

Downloads 45

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic speech recognition tasks in the Central Kurdish dialect, capable of converting speech to text.

Model Features

Central Kurdish dialect support

A speech recognition model specifically optimized for the Central Kurdish dialect.

Based on XLS-R architecture

Uses facebook's wav2vec2-xls-r-300m as the base model, with powerful speech feature extraction capabilities.

Multi-dataset training

Trained and evaluated on the Common Voice dataset and Robust Speech Event dataset.

Model Capabilities

Kurdish speech recognition

Speech-to-text

Use Cases

Speech transcription

Kurdish speech transcription

Converts speech in the Central Kurdish dialect to text

WER 0.3675, CER 0.0783

Voice assistants

Kurdish voice command recognition

Speech recognition module for Kurdish voice assistants or voice control systems

🚀 Akashpb13/Central_kurdish_xlsr

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - hu dataset. It is designed for automatic speech recognition tasks, achieving certain performance metrics on evaluation sets.

📚 Documentation

Model Information

Property	Details
Language	ckb
License	apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, ckb, robust - speech - event, model_for_talk, hf - asr - leaderboard
Datasets	mozilla - foundation/common_voice_8_0

Model Index

Name: Akashpb13/Central_kurdish_xlsr
Results:
- Task 1:
  - Task Name: Automatic Speech Recognition
  - Task Type: automatic - speech - recognition
  - Dataset:
    - Name: Common Voice 8
    - Type: mozilla - foundation/common_voice_8_0
    - Args: ckb
  - Metrics:
    - Test WER: 0.36754389884276845
    - Test CER: 0.07827896768334217
- Task 2:
  - Task Name: Automatic Speech Recognition
  - Task Type: automatic - speech - recognition
  - Dataset:
    - Name: Robust Speech Event - Dev Data
    - Type: speech - recognition - community - v2/dev_data
    - Args: ckb
  - Metrics:
    - Test WER: 0.36754389884276845
    - Test CER: 0.07827896768334217

Model Description

"facebook/wav2vec2-xls-r-300m" was finetuned.

Intended Uses & Limitations

More information needed

Training and Evaluation Data

Training Data: Common voice Central Kurdish train.tsv, dev.tsv, invalidated.tsv, reported.tsv, and other.tsv. Only those points were considered where upvotes were greater than downvotes and duplicates were removed after concatenation of all the datasets given in common voice 7.0.

Training Procedure

For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000095637994662983496
train_batch_size: 16
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 2
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 200
num_epochs: 100
mixed_precision_training: Native AMP

Training Results

Step	Training Loss	Validation Loss	Wer
500	5.097800	2.190326	1.001207
1000	0.797500	0.331392	0.576819
1500	0.405100	0.262009	0.549049
2000	0.322100	0.248178	0.479626
2500	0.264600	0.258866	0.488983
3000	0.228300	0.261523	0.469665
3500	0.201000	0.270135	0.451856
4000	0.180900	0.279302	0.448536
4500	0.163800	0.280921	0.459704
5000	0.147300	0.319249	0.471778
5500	0.137600	0.289546	0.449140
6000	0.132000	0.311350	0.458195
6500	0.117100	0.316726	0.432840
7000	0.109200	0.302210	0.439481
7500	0.104900	0.325913	0.439481
8000	0.097500	0.329446	0.431935
8500	0.088600	0.345259	0.425898
9000	0.084900	0.342891	0.428313
9500	0.080900	0.353081	0.424389
10000	0.075600	0.347063	0.424992
10500	0.072800	0.330086	0.424691
11000	0.068100	0.350658	0.421974
11500	0.064700	0.342949	0.413522
12000	0.061500	0.341704	0.415334
12500	0.059500	0.346279	0.411410
13000	0.057400	0.349901	0.407184
13500	0.056400	0.347733	0.402656
14000	0.053300	0.344899	0.405976
14500	0.052900	0.346708	0.402656
15000	0.050600	0.344118	0.400845
15500	0.050200	0.348396	0.402958
16000	0.049800	0.348312	0.401751
16500	0.051900	0.348372	0.401147
17000	0.049800	0.348580	0.401147

Framework Versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.18.1
Tokenizers 0.10.3

Evaluation Commands

1. To evaluate on `mozilla - foundation/common_voice_8_0` with split `test`

python eval.py --model_id Akashpb13/Central_kurdish_xlsr --dataset mozilla - foundation/common_voice_8_0 --config ckb --split test

📄 License

This project is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご