XLS - R - 300M - UR Open - source Automatic Speech Recognition Model - Free and Accurate Conversion of Urdu Speech to Text

Xls R 300m Ur

Developed by HarrisDePerceptron

This is an automatic speech recognition model fine-tuned on the Common Voice 8.0 Urdu dataset based on the XLS-R architecture, with a word error rate (WER) of 47.38.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Urdu ASR #Low-resource speech recognition #XLS-R fine-tuning

Downloads 19

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic speech recognition tasks in Urdu, fine-tuned on the 300M-parameter XLS-R architecture.

Model Features

Urdu Speech Recognition

Speech recognition capabilities optimized specifically for Urdu

Based on Common Voice Dataset

Trained using the Mozilla Common Voice 8.0 Urdu dataset

XLS-R Architecture

Based on the powerful XLS-R 300M parameter model architecture

Model Capabilities

Urdu speech-to-text

Automatic speech recognition

Use Cases

Speech Transcription

Urdu Speech Transcription

Convert Urdu speech content into text

Word error rate 47.38

Voice Assistants

Urdu Voice Command Recognition

Used for command recognition in Urdu voice assistant systems

🚀 Automatic Speech Recognition Model

This model is a fine - tuned speech recognition model based on the HarrisDePerceptron/xls - r - 300m - ur model. It is trained on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - UR dataset and shows excellent performance in speech recognition tasks.

🚀 Quick Start

This model is a fine - tuned version of [HarrisDePerceptron/xls - r - 300m - ur](https://huggingface.co/HarrisDePerceptron/xls - r - 300m - ur) on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - UR dataset. It achieves the following results on the evaluation set:

Loss: 1.0517
WER: 0.5151291512915129
CER: 0.23689640940982254

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.2991	1.96	100	0.9769	0.6627
1.3415	3.92	200	0.9701	0.6594
1.2998	5.88	300	0.9678	0.6668
1.2881	7.84	400	0.9650	0.6613
1.2369	9.8	500	0.9392	0.6502
1.2293	11.76	600	0.9536	0.6480
1.1709	13.73	700	0.9265	0.6402
1.1492	15.69	800	0.9636	0.6506
1.1044	17.65	900	0.9305	0.6351
1.0704	19.61	1000	0.9329	0.6280
1.0039	21.57	1100	0.9413	0.6295
0.9756	23.53	1200	0.9718	0.6185
0.9633	25.49	1300	0.9731	0.6133
0.932	27.45	1400	0.9659	0.6199
0.9252	29.41	1500	0.9766	0.6196
0.9172	31.37	1600	1.0052	0.6199
0.8733	33.33	1700	0.9955	0.6203
0.868	35.29	1800	1.0069	0.6240
0.8547	37.25	1900	0.9783	0.6258
0.8451	39.22	2000	0.9845	0.6052
0.8374	41.18	2100	0.9496	0.6137
0.8153	43.14	2200	0.9756	0.6122
0.8134	45.1	2300	0.9712	0.6096
0.8019	47.06	2400	0.9565	0.5970
0.7746	49.02	2500	0.9864	0.6096
0.7664	50.98	2600	0.9988	0.6092
0.7708	52.94	2700	1.0181	0.6255
0.7468	54.9	2800	0.9918	0.6148
0.7241	56.86	2900	1.0150	0.6018
0.7165	58.82	3000	1.0439	0.6063
0.7104	60.78	3100	1.0016	0.6037
0.6954	62.75	3200	1.0117	0.5970
0.6753	64.71	3300	1.0191	0.6037
0.6803	66.67	3400	1.0190	0.6033
0.661	68.63	3500	1.0284	0.6007
0.6597	70.59	3600	1.0060	0.5967
0.6398	72.55	3700	1.0372	0.6048
0.6105	74.51	3800	1.0048	0.6044
0.6164	76.47	3900	1.0398	0.6148
0.6354	78.43	4000	1.0272	0.6133
0.5952	80.39	4100	1.0364	0.6081
0.5814	82.35	4200	1.0418	0.6092
0.6079	84.31	4300	1.0277	0.5967
0.5748	86.27	4400	1.0362	0.6041
0.5624	88.24	4500	1.0427	0.6007
0.5767	90.2	4600	1.0370	0.5919
0.5793	92.16	4700	1.0442	0.6011
0.547	94.12	4800	1.0516	0.5982
0.5513	96.08	4900	1.0461	0.5989
0.5429	98.04	5000	1.0504	0.5996
0.5404	100.0	5100	1.0517	0.5967

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned Automatic Speech Recognition Model
Training Data	MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - UR dataset

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご