wav2vec2-xls-r-300m-cv7-turkish Open-source Speech Recognition Model

Home

Wav2vec2 Xls R 300m Cv7 Turkish

Developed by mpoyraz

Automatic speech recognition model fine-tuned for Turkish based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

Other#Turkish speech recognition #Low word error rate #Multi-dataset training

Downloads 685.31k

Release Time : 3/2/2022

Model Overview

This model is an optimized automatic speech recognition (ASR) system for Turkish, based on the XLS-R architecture and fine-tuned on the Common Voice 7.0 and MediaSpeech datasets.

Model Features

High-performance Turkish recognition

Achieves 8.62% WER and 2.26% CER on the Common Voice 7 test set

Multi-dataset training

Trained on a combination of Common Voice 7.0 and MediaSpeech datasets to enhance model robustness

N-gram language model support

Enhanced recognition performance using a KenLM language model trained on Turkish Wikipedia

Model Capabilities

Turkish speech recognition

Speech-to-text

Long audio processing (supports chunking)

Use Cases

Speech transcription

Turkish speech-to-text

Convert Turkish speech content into text

Achieves over 92% accuracy on standard test sets

Voice assistants

Turkish voice command recognition

Used for voice command recognition in Turkish voice assistant systems

🚀 wav2vec2-xls-r-300m-cv7-turkish

This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language, offering high - quality automatic speech recognition capabilities for Turkish.

🚀 Quick Start

Before running the evaluation, you need to install the unicode_tr package for Turkish text processing.

Evaluate on `mozilla - foundation/common_voice_7_0` with split `test`

python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset mozilla-foundation/common_voice_7_0 --config tr --split test

Evaluate on `speech - recognition-community-v2/dev_data`

python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv7-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

✨ Features

Fine - tuned for Turkish: Based on the pre - trained model facebook/wav2vec2-xls-r-300m, it is fine - tuned on Turkish language data, providing better performance for Turkish speech recognition.
Multi - dataset support: Supports multiple datasets such as Common Voice 7.0 TR and MediaSpeech.

📦 Installation

There is no specific installation content provided in the original document. If you want to use this model, you may need to install relevant dependencies such as unicode_tr, transformers, pytorch, datasets, and tokenizers according to the framework versions mentioned in the document.

📚 Documentation

Model description

This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.

Training and evaluation data

The following datasets were used for finetuning:

Common Voice 7.0 TR: All validated split except test split was used for training.
MediaSpeech

Training procedure

To support both of the datasets above, custom pre - processing and loading steps were performed and wav2vec2-turkish repo was used for that purpose.

Training hyperparameters

The following hyperparameters were used for finetuning:

learning_rate: 2e - 4
num_train_epochs: 10
warmup_steps: 500
freeze_feature_extractor
mask_time_prob: 0.1
mask_feature_prob: 0.05
feat_proj_dropout: 0.05
attention_dropout: 0.05
final_dropout: 0.05
activation_dropout: 0.05
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
gradient_accumulation_steps: 8

Framework versions

Transformers: 4.16.0.dev0
Pytorch: 1.10.1
Datasets: 1.17.0
Tokenizers: 0.10.3

Language Model

An N - gram language model is trained on Turkish Wikipedia articles using KenLM and ngram - lm - wiki repo was used to generate arpa LM and convert it into binary format.

Evaluation results

Dataset	WER	CER
Common Voice 7 TR test split	8.62	2.26
Speech Recognition Community dev data	30.87	10.69

🔧 Technical Details

The model is based on the pre - trained model facebook/wav2vec2-xls-r-300m and is fine - tuned on Turkish language data. Custom pre - processing and loading steps are used to support multiple datasets. The N - gram language model is trained on Turkish Wikipedia articles to improve the performance of speech recognition.

📄 License

The model is licensed under CC - BY - 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご