wav2vec2-large-xls-r-300m-urdu Open-source Speech Recognition Model - Supports Automatic Urdu Speech Recognition

Wav2vec2 Large Xls R 300m Urdu

Developed by infinitejoy

This is an automatic speech recognition model fine-tuned on the Urdu Common Voice 7 dataset based on facebook/wav2vec2-xls-r-300m.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Urdu speech recognition #Multi-dialect support #Low-resource optimization

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is primarily used for automatic speech recognition tasks in Urdu, capable of converting Urdu speech into text.

Model Features

Urdu speech recognition

Speech recognition capability specifically optimized for Urdu

Based on XLS-R architecture

Uses Facebook's XLS-R-300M pre-trained model as the foundation

Trained on Common Voice dataset

Fine-tuned on Mozilla Common Voice 7 Urdu dataset

Model Capabilities

Urdu speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Urdu speech transcription

Convert Urdu speech content into text

Voice assistants

Urdu voice interaction

Provide recognition capability for Urdu voice assistants

🚀 XLS-R-300M - Urdu

This is a fine - tuned model for automatic speech recognition on the Urdu language, based on the pre - trained wav2vec2 - xls - r - 300m model.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - -UR dataset. It achieves the following results on the evaluation set:

Loss: NA
Wer: NA

✨ Features

Tags: automatic - speech - recognition, generated_from_trainer, hf - asr - leaderboard, model_for_talk, mozilla - foundation/common_voice_7_0, robust - speech - event, ur
Datasets: mozilla - foundation/common_voice_7_0

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torchaudio.functional as F


model_id = "infinitejoy/wav2vec2-large-xls-r-300m-urdu"

sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "ur", split="test", streaming=True, use_auth_token=True))

sample = next(sample_iter)
resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()

model = AutoModelForCTC.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

input_values = processor(resampled_audio, return_tensors="pt").input_values

with torch.no_grad():
    logits = model(input_values).logits

transcription = processor.batch_decode(logits.numpy()).text

Advanced Usage

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_7_0 with split test

python eval.py \
    --model_id infinitejoy/wav2vec2-large-xls-r-300m-urdu --dataset speech-recognition-community-v2/dev_data \
    --config ur --split validation --chunk_length_s 10 --stride_length_s 1

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.17.1.dev0
Tokenizers 0.10.3

Model Index

Property	Details
Model Name	XLS - R - 300M - Urdu
Task	Automatic Speech Recognition
Dataset	Common Voice 7 (mozilla - foundation/common_voice_7_0, args: ur)
Test WER	105.66
Test CER	434.011

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご