đ wav2vec2-xls-r-300m-cv8-turkish
This is an Automatic Speech Recognition (ASR) model fine - tuned for the Turkish language.
đ Quick Start
Before running evaluation, please install the unicode_tr package, which is used for Turkish text processing.
Evaluate on mozilla-foundation/common_voice_8_0
with split test
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test
Evaluate on speech-recognition-community-v2/dev_data
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
⨠Features
- This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language.
- An N - gram language model is trained on Turkish Wikipedia articles using KenLM.
đĻ Installation
No specific installation steps are provided in the original document.
đ Documentation
Model Description
This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language.
Training and Evaluation Data
The following datasets were used for finetuning:
Training Procedure
To support the above datasets, custom pre - processing and loading steps were performed, and the wav2vec2-turkish repo was used for that purpose.
Training Hyperparameters
The following hyperparameters were used for finetuning:
- learning_rate: 2.5e - 4
- num_train_epochs: 20
- warmup_steps: 500
- freeze_feature_extractor
- mask_time_prob: 0.1
- mask_feature_prob: 0.1
- feat_proj_dropout: 0.05
- attention_dropout: 0.05
- final_dropout: 0.1
- activation_dropout: 0.05
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- gradient_accumulation_steps: 8
Framework Versions
- Transformers: 4.17.0.dev0
- Pytorch: 1.10.1
- Datasets: 1.17.0
- Tokenizers: 0.10.3
Language Model
An N - gram language model is trained on Turkish Wikipedia articles using KenLM, and the ngram-lm-wiki repo was used to generate an arpa LM and convert it into binary format.
Evaluation Results
Dataset |
WER |
CER |
Common Voice 8 TR test split |
10.61 |
2.67 |
Speech Recognition Community dev data |
36.46 |
12.38 |
đ§ Technical Details
The model mpoyraz/wav2vec2-xls-r-300m-cv8-turkish
is a fine - tuned version of facebook/wav2vec2-xls-r-300m
for the Turkish language. Custom pre - processing and loading steps are performed to support the mozilla-foundation/common_voice_8_0
dataset. Hyperparameters are carefully selected for finetuning, and an N - gram language model is trained on Turkish Wikipedia articles using KenLM.
đ License
This model is licensed under the Apache 2.0 license.
đ Model Index
- Name: mpoyraz/wav2vec2-xls-r-300m-cv8-turkish
Results:
- Task:
Name: Automatic Speech Recognition
Type: automatic - speech - recognition
Dataset:
Name: Common Voice 8
Type: mozilla - foundation/common_voice_8_0
Args: tr
Metrics:
- Name: Test WER
Type: wer
Value: 10.61
- Name: Test CER
Type: cer
Value: 2.67
- Task:
Name: Automatic Speech Recognition
Type: automatic - speech - recognition
Dataset:
Name: Robust Speech Event - Dev Data
Type: speech - recognition - community - v2/dev_data
Args: tr
Metrics:
- Name: Test WER
Type: wer
Value: 36.46
- Name: Test CER
Type: cer
Value: 12.38
- Task:
Name: Automatic Speech Recognition
Type: automatic - speech - recognition
Dataset:
Name: Robust Speech Event - Test Data
Type: speech - recognition - community - v2/eval_data
Args: tr
Metrics:
- Name: Test WER
Type: wer
Value: 40.91