The open-source model wav2vec2-xls-r-300m-cv8-turkish - Accurately implement Turkish automatic speech recognition

Wav2vec2 Xls R 300m Cv8 Turkish

Developed by Baybars

This is an automatic speech recognition (ASR) model fine-tuned on the Turkish Common Voice 8 dataset based on Facebook's wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Turkish speech recognition #Low word error rate #Common Voice fine-tuning

Downloads 16

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Turkish speech recognition tasks, trained on the Common Voice 8 dataset, and has achieved good word error rate and character error rate performance.

Model Features

High-performance Turkish recognition

Achieved a word error rate of 30.98% and a character error rate of 7.64% on the Common Voice test set

Based on large-scale pre-trained model

Fine-tuned from Facebook's wav2vec2-xls-r-300m model with powerful speech feature extraction capabilities

Incorporates N-gram language model

Uses an N-gram language model trained on Turkish Wikipedia for decoding to improve recognition accuracy

Model Capabilities

Turkish speech recognition

Long audio processing (supports chunking)

High-accuracy character-level recognition

Use Cases

Speech-to-text

Turkish speech transcription

Convert Turkish speech content into text

30.98% WER on Common Voice test set

Voice assistants

Turkish voice command recognition

Used for voice command recognition in Turkish voice assistant systems

🚀 Baybars/wav2vec2-xls-r-300m-cv8-turkish

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - TR dataset. It can achieve high - quality automatic speech recognition for Turkish, providing accurate results in loss, WER, and CER.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - TR dataset. It achieves the following results on the evaluation set:

Loss: 0.4164
Wer: 0.3098
Cer: 0.0764

✨ Features

High - accuracy automatic speech recognition for Turkish.
Fine - tuned on the COMMON_VOICE - TR dataset.
Utilizes an N - gram language model trained on Turkish Wikipedia articles.

📦 Installation

Please install unicode_tr package before running evaluation. It is used for Turkish text processing.

💻 Usage Examples

Evaluation on `mozilla - foundation/common_voice_7_0` with split `test`

python eval.py --model_id Baybars/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test

Evaluation on `speech - recognition - community - v2/dev_data`

python eval.py --model_id Baybars/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Language Model

N - gram language model is trained by mpoyraz on a Turkish Wikipedia articles using KenLM and ngram - lm - wiki repo was used to generate arpa LM and convert it into binary format.

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 64
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.6356	9.09	500	0.5055	0.5536	0.1381
0.3847	18.18	1000	0.4002	0.4247	0.1065
0.3377	27.27	1500	0.4193	0.4167	0.1078
0.2175	36.36	2000	0.4351	0.3861	0.0974
0.2074	45.45	2500	0.3962	0.3622	0.0916
0.159	54.55	3000	0.4062	0.3526	0.0888
0.1882	63.64	3500	0.3991	0.3445	0.0850
0.1766	72.73	4000	0.4214	0.3396	0.0847
0.116	81.82	4500	0.4182	0.3265	0.0812
0.0718	90.91	5000	0.4259	0.3191	0.0781
0.019	100.0	5500	0.4164	0.3098	0.0764

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 Xls R 300m Cv8 Turkish

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Baybars/wav2vec2-xls-r-300m-cv8-turkish

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

Evaluation on mozilla - foundation/common_voice_7_0 with split test

Evaluation on speech - recognition - community - v2/dev_data

📚 Documentation

Model description

Intended uses & limitations

Training and evaluation data

Language Model

🔧 Technical Details

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License

Evaluation on `mozilla - foundation/common_voice_7_0` with split `test`

Evaluation on `speech - recognition - community - v2/dev_data`