Wav2vec2-Urdu Open-Source Urdu Speech Recognition Model - Free and Accurate Speech-to-Text Conversion

Home

Wav2vec2 Urdu

Developed by kingabzpro

Urdu automatic speech recognition model based on wav2vec2 architecture, fine-tuned on Common Voice dataset

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Urdu speech recognition #Low-resource fine-tuning #Multi-dialect adaptation

Downloads 101

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model optimized for Urdu, based on Facebook's wav2vec2 architecture and fine-tuned on the Common Voice Urdu dataset.

Model Features

Urdu optimization

Specifically optimized for Urdu speech recognition tasks

Based on wav2vec2 architecture

Utilizes Facebook's powerful wav2vec2 architecture with excellent speech feature extraction capabilities

Few-shot fine-tuning

Fine-tuned on limited Urdu speech data (0.58 hours)

Model Capabilities

Urdu speech recognition

Speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Urdu speech transcription

Convert Urdu speech to text

Word Error Rate 57.47%, Character Error Rate 32.68%

Voice assistants

Urdu voice command recognition

For voice command recognition in Urdu voice assistants or control systems

🚀 wav2vec2-large-xls-r-300m-Urdu

This model is a fine - tuned version of the base model, which can be used for automatic speech recognition. It addresses the challenge of Urdu speech recognition by fine - tuning on the common_voice dataset, providing more accurate recognition results.

🚀 Quick Start

This model is a fine - tuned version of Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 on the common_voice dataset. It achieves the following results on the evaluation set:

Wer: 0.5747
Cer: 0.3268

✨ Features

Automatic Speech Recognition: Specialized for Urdu speech recognition.
Fine - Tuned Model: Based on the Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 base model, fine - tuned on the common_voice dataset.

📚 Documentation

Model description

The training and valid dataset is 0.58 hours. It was hard to train any model on a lower number of samples, so the author decided to take the vakyansh-wav2vec2-urdu-urm-60 checkpoint and finetune the wav2vec2 model.

Training procedure

Trained on Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 due to the lesser number of samples.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 64
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
4.3054	16.67	50	9.0055	0.8306	0.4869
2.0629	33.33	100	9.5849	0.6061	0.3414
0.8966	50.0	150	4.8686	0.6052	0.3426
0.4197	66.67	200	12.3261	0.5817	0.3370
0.294	83.33	250	11.9653	0.5712	0.3328
0.2329	100.0	300	7.6846	0.5747	0.3268

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned wav2vec2 model for Urdu speech recognition
Training Data	mozilla - foundation/common_voice_8_0
Metrics	Wer, Cer
Pipeline Tag	automatic - speech - recognition
Base Model	Harveenchadha/vakyansh - wav2vec2 - urdu - urm - 60

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご