đ wav2vec2-xls-r-300m-cv7-turkish
This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language, offering high - quality automatic speech recognition capabilities for Turkish.
đ Quick Start
Before running the evaluation, you need to install the unicode_tr package for Turkish text processing.
Evaluate on mozilla - foundation/common_voice_7_0
with split test
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset mozilla-foundation/common_voice_7_0 --config tr --split test
Evaluate on speech - recognition-community-v2/dev_data
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
⨠Features
- Fine - tuned for Turkish: Based on the pre - trained model facebook/wav2vec2-xls-r-300m, it is fine - tuned on Turkish language data, providing better performance for Turkish speech recognition.
- Multi - dataset support: Supports multiple datasets such as Common Voice 7.0 TR and MediaSpeech.
đĻ Installation
There is no specific installation content provided in the original document. If you want to use this model, you may need to install relevant dependencies such as unicode_tr
, transformers
, pytorch
, datasets
, and tokenizers
according to the framework versions mentioned in the document.
đ Documentation
Model description
This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.
Training and evaluation data
The following datasets were used for finetuning:
Training procedure
To support both of the datasets above, custom pre - processing and loading steps were performed and wav2vec2-turkish repo was used for that purpose.
Training hyperparameters
The following hyperparameters were used for finetuning:
- learning_rate: 2e - 4
- num_train_epochs: 10
- warmup_steps: 500
- freeze_feature_extractor
- mask_time_prob: 0.1
- mask_feature_prob: 0.05
- feat_proj_dropout: 0.05
- attention_dropout: 0.05
- final_dropout: 0.05
- activation_dropout: 0.05
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- gradient_accumulation_steps: 8
Framework versions
- Transformers: 4.16.0.dev0
- Pytorch: 1.10.1
- Datasets: 1.17.0
- Tokenizers: 0.10.3
Language Model
An N - gram language model is trained on Turkish Wikipedia articles using KenLM and ngram - lm - wiki repo was used to generate arpa LM and convert it into binary format.
Evaluation results
Dataset |
WER |
CER |
Common Voice 7 TR test split |
8.62 |
2.26 |
Speech Recognition Community dev data |
30.87 |
10.69 |
đ§ Technical Details
The model is based on the pre - trained model facebook/wav2vec2-xls-r-300m and is fine - tuned on Turkish language data. Custom pre - processing and loading steps are used to support multiple datasets. The N - gram language model is trained on Turkish Wikipedia articles to improve the performance of speech recognition.
đ License
The model is licensed under CC - BY - 4.0.