Open-source model wav2vec2-large-xls-r-300m-pa-IN-dx1 - Free deployment for automated Punjabi speech recognition

Wav2vec2 Large Xls R 300m Pa IN Dx1

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on Punjabi (India) dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Punjabi speech recognition #Low-resource language optimization #XLS-R architecture

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition system optimized for Punjabi (India), trained on Common Voice 8 dataset, suitable for speech-to-text tasks

Model Features

Punjabi optimization

Speech recognition model specifically optimized for Punjabi (India) dialect

Based on large-scale pre-trained model

Fine-tuned based on facebook/wav2vec2-xls-r-300m model, inheriting powerful speech feature extraction capabilities

Medium-scale parameters

Model size of 300 million parameters, balancing performance and efficiency

Model Capabilities

Speech-to-text

Punjabi speech recognition

Automatic speech recognition

Use Cases

Speech transcription

Punjabi speech transcription

Convert Punjabi (India) speech to text

Test set WER 48.73%, CER 16.87%

Voice assistant

Punjabi voice command recognition

Used to build Punjabi-supported voice assistant systems

🚀 wav2vec2-large-xls-r-300m-pa-IN-dx1

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - PA - IN dataset. It can be used for automatic speech recognition tasks, achieving certain performance on the evaluation set.

✨ Features

Language Support: Specifically fine - tuned for the Punjabi (pa - IN) language.
Multiple Datasets: Evaluated on multiple datasets including Common Voice 8 and Robust Speech Event - Dev Data.
Performance Metrics: Achieved specific WER and CER values on the test set.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Name	wav2vec2-large-xls-r-300m-pa-IN-dx1
Model Type	Fine - tuned from facebook/wav2vec2-xls-r-300m
Training Datasets	mozilla - foundation/common_voice_8_0
Languages Supported	pa - IN

Evaluation Results

This model achieves the following results on different evaluation sets:

Task	Dataset	Test WER	Test CER
Automatic Speech Recognition	Common Voice 8 (pa - IN)	0.48725989807918463	0.1687305197540224
Automatic Speech Recognition	Robust Speech Event - Dev Data (pa - IN)	NA	NA

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split:

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-pa-IN-dx1 --dataset mozilla-foundation/common_voice_8_0 --config pa-IN --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data: Punjabi language isn't available in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 0.0003
Train Batch Size: 16
Eval Batch Size: 8
Seed: 42
Optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
LR Scheduler Type: linear
LR Scheduler Warmup Steps: 1200
Number of Epochs: 100.0
Mixed Precision Training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
3.4607	9.26	500	2.7746	1.0416
0.3442	18.52	1000	0.9114	0.5911
0.2213	27.78	1500	0.9687	0.5751
0.1242	37.04	2000	1.0204	0.5461
0.0998	46.3	2500	1.0250	0.5233
0.0727	55.56	3000	1.1072	0.5382
0.0605	64.81	3500	1.0588	0.5073
0.0458	74.07	4000	1.0818	0.5069
0.0338	83.33	4500	1.0948	0.5108
0.0223	92.59	5000	1.0986	0.4775

Framework Versions

Transformers: 4.17.0.dev0
Pytorch: 1.10.2+cu102
Datasets: 1.18.2.dev0
Tokenizers: 0.11.0

🔧 Technical Details

The model is a fine - tuned version of the pre - trained model facebook/wav2vec2 - xls - r - 300m. It uses specific hyperparameters during training and is evaluated on multiple datasets to ensure its performance on the Punjabi language.

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご