wav2vec2-xls-r-300m-pa-IN-r5 Open-source Speech Recognition Model - Accurately Recognize Punjabi (Indian) Speech

Wav2vec2 Xls R 300m Pa IN R5

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on the Punjabi (India) dataset based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Punjabi speech recognition #Low-resource language ASR #XLS-R architecture optimization

Downloads 25

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Punjabi (India) speech recognition tasks, trained on the Mozilla Common Voice 8.0 dataset, and can be used to convert Punjabi speech into text.

Model Features

Punjabi speech recognition

Speech recognition model specifically optimized for Punjabi (India)

Based on wav2vec2 architecture

Uses facebook's wav2vec2-xls-r-300m pre-trained model as the foundation

Trained on Common Voice dataset

Fine-tuned using Mozilla Foundation's common_voice_8_0 dataset

Model Capabilities

Punjabi speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Punjabi speech transcription

Convert Punjabi speech content into text

Achieved WER of 41.87% and CER of 13.30% on the test set

Voice assistant

Punjabi voice assistant

Provides voice interaction capabilities for Punjabi users

🚀 Wav2Vec2-XLS-R-300M for Punjabi (pa-IN)

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the Punjabi (pa - IN) subset of the Mozilla Foundation's Common Voice 8.0 dataset. It offers high - quality automatic speech recognition capabilities for the Punjabi language.

🚀 Quick Start

Evaluation

To evaluate the model, you can use the following commands:

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-xls-r-300m-pa-IN-r5 --dataset mozilla-foundation/common_voice_8_0 --config pa-IN --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

Note that the Punjabi language isn't available in speech - recognition - community - v2/dev_data.

✨ Features

Fine - Tuned for Punjabi: Specifically trained on the Punjabi subset of the Common Voice 8.0 dataset, ensuring better performance for Punjabi speech recognition.
High - Quality Results: Achieves a Test WER of 0.4186593492747942 and a Test CER of 0.13301322550753938 on the Common Voice 8 dataset.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

The basic usage involves using the evaluation commands as shown above to test the model's performance on different datasets.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned wav2vec2 - xls - r - 300m for Punjabi (pa - IN)
Training Data	mozilla - foundation/common_voice_8_0 (pa - IN subset)

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.8881
Wer: 0.4175

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.000111
train_batch_size: 16
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 200.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
10.695	18.52	500	3.5681	1.0
3.2718	37.04	1000	2.3081	0.9643
0.8727	55.56	1500	0.7227	0.5147
0.3349	74.07	2000	0.7498	0.4959
0.2134	92.59	2500	0.7779	0.4720
0.1445	111.11	3000	0.8120	0.4594
0.1057	129.63	3500	0.8225	0.4610
0.0826	148.15	4000	0.8307	0.4351
0.0639	166.67	4500	0.8967	0.4316
0.0528	185.19	5000	0.8875	0.4238

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご