XLSR-large-53-ur Open-source Automatic Speech Recognition Model

Xlsr Large 53 Ur

Developed by HarrisDePerceptron

This model is an automatic speech recognition model fine-tuned on the Common Voice 8.0 Urdu dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Urdu ASR #Low-resource speech recognition #Common Voice fine-tuning

Downloads 14

Release Time : 3/2/2022

Model Overview

An automatic speech recognition model for Urdu speech-to-text tasks, trained on the Common Voice 8.0 Urdu dataset

Model Features

Urdu Speech Recognition

Speech recognition capabilities specifically optimized for Urdu

Based on wav2vec2 architecture

Uses facebook's wav2vec2-large-xlsr-53 as the base model

Trained on Common Voice dataset

Fine-tuned on Mozilla Common Voice 8.0 Urdu dataset

Model Capabilities

Urdu speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Urdu speech transcription

Convert Urdu speech content into text

Achieved a word error rate of 62.47% on the test set

Voice assistants

Urdu voice command recognition

For Urdu voice assistants or voice control applications

🚀 Automatic Speech Recognition Model

This model is a fine - tuned speech recognition model that addresses the challenges of transcribing Urdu speech. It offers high - accuracy transcription on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - UR dataset, providing valuable support for Urdu language processing.

🚀 Quick Start

This model is a fine - tuned version of [facebook/wav2vec2 - large - xlsr - 53](https://huggingface.co/facebook/wav2vec2 - large - xlsr - 53) on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - UR dataset. It achieves the following results on the evaluation set:

Loss: 0.8888
Wer: 0.6642

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
10.1224	1.96	100	3.5429	1.0
3.2411	3.92	200	3.1786	1.0
3.1283	5.88	300	3.0571	1.0
3.0044	7.84	400	2.9560	0.9996
2.9388	9.8	500	2.8977	1.0011
2.86	11.76	600	2.6944	0.9952
2.5538	13.73	700	2.0967	0.9435
2.1214	15.69	800	1.4816	0.8428
1.8136	17.65	900	1.2459	0.8048
1.6795	19.61	1000	1.1232	0.7649
1.5571	21.57	1100	1.0510	0.7432
1.4975	23.53	1200	1.0298	0.6963
1.4485	25.49	1300	0.9775	0.7074
1.3924	27.45	1400	0.9798	0.6956
1.3604	29.41	1500	0.9345	0.7092
1.3224	31.37	1600	0.9535	0.6830
1.2816	33.33	1700	0.9178	0.6679
1.2623	35.29	1800	0.9249	0.6679
1.2421	37.25	1900	0.9124	0.6734
1.2208	39.22	2000	0.8962	0.6664
1.2145	41.18	2100	0.8903	0.6734
1.1888	43.14	2200	0.8883	0.6708
1.1933	45.1	2300	0.8928	0.6723
1.1838	47.06	2400	0.8868	0.6679
1.1634	49.02	2500	0.8886	0.6657

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご