wav2vec2-xls-r-300m-cv8-turkish Open-source Model - Accurately Achieve Automatic Speech Recognition for Turkish

Wav2vec2 Xls R 300m Cv8 Turkish

Developed by mpoyraz

Turkish automatic speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m, trained on Common Voice 8.0 TR dataset

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Turkish speech recognition #Low CER transcription #Common Voice optimization

Downloads 382

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) system optimized for Turkish, fine-tuned based on wav2vec2-xls-r-300m architecture, supporting conversion of Turkish audio to text

Model Features

High-performance Turkish recognition

Achieves 10.61% WER and 2.67% CER on Common Voice 8 test set

Based on XLS-R architecture

Uses facebook's wav2vec2-xls-r-300m as base model with powerful speech feature extraction capabilities

Custom language model support

Trained N-gram language model on Turkish Wikipedia to improve recognition accuracy

Model Capabilities

Turkish audio to text conversion

Long audio processing (supports chunk processing)

High-accuracy speech recognition

Use Cases

Speech transcription

Turkish speech to text

Convert Turkish speech content into editable text format

Achieves over 90% accuracy on standard test sets

Voice assistants

Turkish voice command recognition

Provides core recognition capability for Turkish voice assistants

🚀 wav2vec2-xls-r-300m-cv8-turkish

This is an Automatic Speech Recognition (ASR) model fine - tuned for the Turkish language.

🚀 Quick Start

Before running evaluation, please install the unicode_tr package, which is used for Turkish text processing.

Evaluate on `mozilla-foundation/common_voice_8_0` with split `test`

python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test

Evaluate on `speech-recognition-community-v2/dev_data`

python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

✨ Features

This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language.
An N - gram language model is trained on Turkish Wikipedia articles using KenLM.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Model Description

This ASR model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Turkish language.

Training and Evaluation Data

The following datasets were used for finetuning:

Common Voice 8.0 TR. All validated split except the test split was used for training.

Training Procedure

To support the above datasets, custom pre - processing and loading steps were performed, and the wav2vec2-turkish repo was used for that purpose.

Training Hyperparameters

The following hyperparameters were used for finetuning:

learning_rate: 2.5e - 4
num_train_epochs: 20
warmup_steps: 500
freeze_feature_extractor
mask_time_prob: 0.1
mask_feature_prob: 0.1
feat_proj_dropout: 0.05
attention_dropout: 0.05
final_dropout: 0.1
activation_dropout: 0.05
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
gradient_accumulation_steps: 8

Framework Versions

Transformers: 4.17.0.dev0
Pytorch: 1.10.1
Datasets: 1.17.0
Tokenizers: 0.10.3

Language Model

An N - gram language model is trained on Turkish Wikipedia articles using KenLM, and the ngram-lm-wiki repo was used to generate an arpa LM and convert it into binary format.

Evaluation Results

Dataset	WER	CER
Common Voice 8 TR test split	10.61	2.67
Speech Recognition Community dev data	36.46	12.38

🔧 Technical Details

The model mpoyraz/wav2vec2-xls-r-300m-cv8-turkish is a fine - tuned version of facebook/wav2vec2-xls-r-300m for the Turkish language. Custom pre - processing and loading steps are performed to support the mozilla-foundation/common_voice_8_0 dataset. Hyperparameters are carefully selected for finetuning, and an N - gram language model is trained on Turkish Wikipedia articles using KenLM.

📄 License

This model is licensed under the Apache 2.0 license.

📊 Model Index

Name: mpoyraz/wav2vec2-xls-r-300m-cv8-turkish Results:
- Task: Name: Automatic Speech Recognition Type: automatic - speech - recognition Dataset: Name: Common Voice 8 Type: mozilla - foundation/common_voice_8_0 Args: tr Metrics:
  - Name: Test WER Type: wer Value: 10.61
  - Name: Test CER Type: cer Value: 2.67
- Task: Name: Automatic Speech Recognition Type: automatic - speech - recognition Dataset: Name: Robust Speech Event - Dev Data Type: speech - recognition - community - v2/dev_data Args: tr Metrics:
  - Name: Test WER Type: wer Value: 36.46
  - Name: Test CER Type: cer Value: 12.38
- Task: Name: Automatic Speech Recognition Type: automatic - speech - recognition Dataset: Name: Robust Speech Event - Test Data Type: speech - recognition - community - v2/eval_data Args: tr Metrics:
  - Name: Test WER Type: wer Value: 40.91

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご