wav2vec2-xlsr-tatar Open-source Speech Recognition Model - Efficiently Identify Tatar Speech Content

Wav2vec2 Xlsr Tatar

Developed by sammy786

This model is an automatic speech recognition model fine-tuned on Tatar language datasets based on facebook/wav2vec2-xls-r-1b, achieving a word error rate (WER) of 16.87% on the Common Voice 8 dataset.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Tatar speech recognition #Low word error rate #Multi-dialect support

Downloads 17

Release Time : 3/2/2022

Model Overview

A pre-trained model for Tatar automatic speech recognition, fine-tuned based on the wav2vec2-xls-r-1b architecture

Model Features

Low word error rate

Achieves a word error rate (WER) of 16.87% and a character error rate (CER) of 3.64% on Tatar test sets

Based on large-scale pre-trained model

Fine-tuned from the facebook/wav2vec2-xls-r-1b model, inheriting its powerful speech feature extraction capabilities

Optimized for Tatar

Specifically optimized for Tatar speech data, suitable for Tatar speech recognition scenarios

Model Capabilities

Tatar speech recognition

Speech-to-text

Continuous speech recognition

Use Cases

Speech transcription

Tatar speech transcription

Convert Tatar speech content into text

Word error rate 16.87%, character error rate 3.64%

Voice assistants

Tatar voice interaction

Provides speech recognition capabilities for Tatar voice assistants

🚀 sammy786/wav2vec2-xlsr-tatar

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - tt dataset. It's designed for automatic speech recognition tasks, providing high - quality speech - to - text conversion.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - tt dataset. It achieves the following results on the evaluation set (which is 10 percent of the train dataset merged with other and dev datasets):

Loss: 7.66
Wer: 7.08

✨ Features

Fine - Tuned: Based on the pre - trained facebook/wav2vec2-xls-r-1b model, fine - tuned on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - tt dataset.
High Performance: Achieves relatively low WER and CER on the evaluation set.

📚 Documentation

Model description

"facebook/wav2vec2-xls-r-1b" was finetuned.

Intended uses & limitations

More information needed

Training and evaluation data

Training data - Common voice Finnish train.tsv, dev.tsv and other.tsv

Training procedure

For creating the train dataset, all possible datasets were appended and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000045637994662983496
train_batch_size: 16
eval_batch_size: 16
seed: 13
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 500
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
200	4.849400	1.874908	0.995232
400	1.105700	0.257292	0.367658
600	0.723000	0.181150	0.250513
800	0.660600	0.167009	0.226078
1000	0.568000	0.135090	0.177339
1200	0.721200	0.117469	0.166413
1400	0.416300	0.115142	0.153765
1600	0.346000	0.105782	0.153963
1800	0.279700	0.102452	0.146149
2000	0.273800	0.095818	0.128468
2200	0.252900	0.102302	0.133766
2400	0.255100	0.096592	0.121316
2600	0.229600	0.091263	0.124561
2800	0.213900	0.097748	0.125687
3000	0.210700	0.091244	0.125422
3200	0.202600	0.084076	0.106284
3400	0.200900	0.093809	0.113238
3600	0.192700	0.082918	0.108139
3800	0.182000	0.084487	0.103371
4000	0.167700	0.091847	0.104960
4200	0.183700	0.085223	0.103040
4400	0.174400	0.083862	0.100589
4600	0.163100	0.086493	0.099728
4800	0.162000	0.081734	0.097543
5000	0.153600	0.077223	0.092974
5200	0.153700	0.086217	0.090789
5400	0.140200	0.093256	0.100457
5600	0.142900	0.086903	0.097742
5800	0.131400	0.083068	0.095225
6000	0.126000	0.086642	0.091252
6200	0.135300	0.083387	0.091186
6400	0.126100	0.076479	0.086352
6600	0.127100	0.077868	0.086153
6800	0.118000	0.083878	0.087676
7000	0.117600	0.085779	0.091054
7200	0.113600	0.084197	0.084233
7400	0.112000	0.078688	0.081319
7600	0.110200	0.082534	0.086087
7800	0.106400	0.077245	0.080988
8000	0.102300	0.077497	0.079332
8200	0.109500	0.079083	0.088339
8400	0.095900	0.079721	0.077809
8600	0.094700	0.079078	0.079730
8800	0.097400	0.078785	0.079200
9000	0.093200	0.077445	0.077015
9200	0.088700	0.078207	0.076617
9400	0.087200	0.078982	0.076485
9600	0.089900	0.081209	0.076021
9800	0.081900	0.078158	0.075757
10000	0.080200	0.078074	0.074498
10200	0.085000	0.078830	0.073373
10400	0.080400	0.078144	0.073373
10600	0.078200	0.077163	0.073902
10800	0.080900	0.076394	0.072446
11000	0.080700	0.075955	0.071585
11200	0.076800	0.077031	0.072313
11400	0.076300	0.077401	0.072777
11600	0.076700	0.076613	0.071916
11800	0.076000	0.076672	0.071916
12000	0.077200	0.076490	0.070989
12200	0.076200	0.076688	0.070856
12400	0.074400	0.076780	0.071055
12600	0.076300	0.076768	0.071320
12800	0.077600	0.076727	0.071055
13000	0.077700	0.076714	0.071254

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.17.1.dev0
Tokenizers 0.10.3

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id sammy786/wav2vec2-xlsr-tatar --dataset mozilla-foundation/common_voice_8_0 --config tt --split test

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご