đ wav2vec2-xls-r-300m-cv6-turkish
This is an Automatic Speech Recognition (ASR) model fine - tuned on the Turkish language, offering high - quality speech recognition capabilities.
đ Quick Start
This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.
⨠Features
- Fine - tuned on Turkish language for accurate Automatic Speech Recognition.
- Supports multiple datasets for training and evaluation.
- Utilizes an N - gram language model trained on Turkish Wikipedia articles.
đĻ Installation
Before running evaluation, please install the unicode_tr package. It is used for Turkish text processing.
đģ Usage Examples
Basic Usage
To evaluate on common_voice
with split test
:
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv6-turkish --dataset common_voice --config tr --split test
Advanced Usage
To evaluate on speech-recognition-community-v2/dev_data
:
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv6-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
đ Documentation
Training and evaluation data
The following datasets were used for finetuning:
Training procedure
To support both of the datasets above, custom pre - processing and loading steps were performed and wav2vec2-turkish repo was used for that purpose.
Training hyperparameters
The following hyperparameters were used for finetuning:
- learning_rate 2e - 4
- num_train_epochs 10
- warmup_steps 500
- freeze_feature_extractor
- mask_time_prob 0.1
- mask_feature_prob 0.1
- feat_proj_dropout 0.05
- attention_dropout 0.05
- final_dropout 0.1
- activation_dropout 0.05
- per_device_train_batch_size 8
- per_device_eval_batch_size 8
- gradient_accumulation_steps 8
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.1
- Datasets 1.18.3
- Tokenizers 0.10.3
Language Model
N - gram language model is trained on a Turkish Wikipedia articles using KenLM and ngram-lm-wiki repo was used to generate arpa LM and convert it into binary format.
Evaluation results
Dataset |
WER |
CER |
Common Voice 6.1 TR test split |
8.83 |
2.37 |
Speech Recognition Community dev data |
32.81 |
11.22 |
đ§ Technical Details
This model is based on the fine - tuning of facebook/wav2vec2-xls-r-300m on Turkish language. Custom pre - processing and loading steps were implemented to support multiple datasets. Hyperparameters were carefully tuned to achieve good performance on Turkish speech recognition tasks.
đ License
This project is licensed under the Apache - 2.0 license.