Wav2vec2-large-xls-r-300m-or-d5 Open-source Model - Free and Accurate Oriya Speech-to-Text Conversion

Wav2vec2 Large Xls R 300m Or D5

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on the Odia dataset based on facebook/wav2vec2-xls-r-300m, specifically designed for Odia speech-to-text tasks.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Odia speech recognition #Low word error rate #Multi-scenario speech processing

Downloads 24

Release Time : 3/2/2022

Model Overview

This model is a speech recognition model fine-tuned on the Mozilla Common Voice 8.0 Odia dataset, capable of converting Odia speech into text.

Model Features

Specialized for Odia

A speech recognition model specifically optimized for Odia

Based on large-scale pre-trained model

Fine-tuned on the facebook/wav2vec2-xls-r-300m model, inheriting its powerful speech feature extraction capabilities

Relatively low CER

Achieved a character error rate (CER) of 15.72% on the test set

Model Capabilities

Odia speech recognition

Speech-to-text

Long audio processing (supports chunk processing)

Use Cases

Speech transcription

Odia speech transcription

Convert Odia speech content into text

Test set WER 57.91%, CER 15.72%

Voice assistant

Odia voice command recognition

Used as a front-end recognition module for Odia voice assistants or voice control systems

🚀 wav2vec2-large-xls-r-300m-or-d5

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - OR dataset. It is designed for automatic speech recognition tasks and has achieved certain results on evaluation sets.

✨ Features

Multilingual Adaptability: Based on the wav2vec2-xls-r-300m architecture, it can adapt to different language scenarios.
High - Precision Recognition: Demonstrates relatively low Word Error Rate (WER) and Character Error Rate (CER) on the evaluation set.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-or-d5 --dataset mozilla-foundation/common_voice_8_0 --config or --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-or-d5 --dataset speech-recognition-community-v2/dev_data --config or --split validation --chunk_length_s 10 --stride_length_s 1

📚 Documentation

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.9571
Wer: 0.5450

Model Index

Task	Dataset	Test WER	Test CER
Automatic Speech Recognition	Common Voice 8 (`mozilla - foundation/common_voice_8_0` - `or`)	0.579136690647482	0.1572148018392818
Automatic Speech Recognition	Robust Speech Event - Dev Data (`speech - recognition - community - v2/dev_data` - `or`)	NA	NA

Training Hyperparameters

The following hyperparameters were used during training:

Hyperparameter	Value
learning_rate	0.000111
train_batch_size	16
eval_batch_size	8
seed	42
gradient_accumulation_steps	2
total_train_batch_size	32
optimizer	Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_steps	800
num_epochs	200
mixed_precision_training	Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
9.2958	12.5	300	4.9014	1.0
3.4065	25.0	600	3.5150	1.0
1.5402	37.5	900	0.8356	0.7249
0.6049	50.0	1200	0.7754	0.6349
0.4074	62.5	1500	0.7994	0.6217
0.3097	75.0	1800	0.8815	0.5985
0.2593	87.5	2100	0.8532	0.5754
0.2097	100.0	2400	0.9077	0.5648
0.1784	112.5	2700	0.9047	0.5668
0.1567	125.0	3000	0.9019	0.5728
0.1315	137.5	3300	0.9295	0.5827
0.1125	150.0	3600	0.9256	0.5681
0.1035	162.5	3900	0.9148	0.5496
0.0901	175.0	4200	0.9480	0.5483
0.0817	187.5	4500	0.9799	0.5516
0.079	200.0	4800	0.9571	0.5450

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご