wav2vec2-large-xls-r-300m-hi-cv8-b2 Open-Source ASR Model

Wav2vec2 Large Xls R 300m Hi Cv8 B2

Developed by DrishtiSharma

This is an automatic speech recognition (ASR) model fine-tuned on the Hindi Common Voice 8.0 dataset, based on Facebook's wav2vec2-xls-r-300m model.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hindi speech recognition #Low word error rate #Common Voice dataset

Downloads 22

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Hindi automatic speech recognition tasks, trained on the Common Voice 8.0 dataset, achieving a low word error rate (WER).

Model Features

High-performance Hindi recognition

Achieved a word error rate (WER) of 38.9% and a character error rate (CER) of 13.0% on the Hindi test set of Common Voice 8.0

Based on XLS-R architecture

Uses Facebook's wav2vec2-XLS-R-300m as the base model, featuring powerful speech feature extraction capabilities

Fine-tuned

Optimized model performance through 35 training epochs using linear learning rate scheduling and warm-up strategies

Model Capabilities

Hindi speech recognition

Speech-to-text

Robust speech event detection

Use Cases

Speech transcription

Hindi speech-to-text

Convert Hindi speech content into text

Achieved 38.9% WER on the test set

Voice assistants

Hindi voice command recognition

Recognize and understand Hindi voice commands

🚀 wav2vec2-large-xls-r-300m-hi-cv8-b2

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HI dataset. It's designed for automatic speech recognition tasks, aiming to provide high - quality speech - to - text conversion for the Hindi language.

✨ Features

Multilingual Adaptability: Based on a large - scale pre - trained model, it can be adapted to different languages.
High - Quality Performance: Achieves relatively low Word Error Rate (WER) and Character Error Rate (CER) on the evaluation set.

📦 Installation

Since no installation steps are provided in the original document, this section is skipped.

💻 Usage Examples

Since no code examples are provided in the original document, this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned wav2vec2 model for Hindi speech recognition
Training Data	mozilla - foundation/common_voice_8_0
Metrics	Word Error Rate (WER), Character Error Rate (CER)

Evaluation Results

The model achieves the following results on the evaluation set:

Loss: 0.7322
Wer: 0.3469

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-hi-cv8-b2 --dataset mozilla-foundation/common_voice_8_0 --config hi --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Hindi language isn't available in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 700
num_epochs: 35
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
9.6226	1.04	200	3.8855	1.0
3.4678	2.07	400	3.4283	1.0
2.3668	3.11	600	1.0743	0.7175
0.7308	4.15	800	0.7663	0.5498
0.4985	5.18	1000	0.6957	0.5001
0.3817	6.22	1200	0.6932	0.4866
0.3281	7.25	1400	0.7034	0.4983
0.2752	8.29	1600	0.6588	0.4606
0.2475	9.33	1800	0.6514	0.4328
0.219	10.36	2000	0.6396	0.4176
0.2036	11.4	2200	0.6867	0.4162
0.1793	12.44	2400	0.6943	0.4196
0.1724	13.47	2600	0.6862	0.4260
0.1554	14.51	2800	0.7615	0.4222
0.151	15.54	3000	0.7058	0.4110
0.1335	16.58	3200	0.7172	0.3986
0.1326	17.62	3400	0.7182	0.3923
0.1225	18.65	3600	0.6995	0.3910
0.1146	19.69	3800	0.7075	0.3875
0.108	20.73	4000	0.7297	0.3858
0.1048	21.76	4200	0.7413	0.3850
0.0979	22.8	4400	0.7452	0.3793
0.0946	23.83	4600	0.7436	0.3759
0.0897	24.87	4800	0.7289	0.3754
0.0854	25.91	5000	0.7271	0.3667
0.0803	26.94	5200	0.7378	0.3656
0.0752	27.98	5400	0.7488	0.3680
0.0718	29.02	5600	0.7185	0.3619
0.0702	30.05	5800	0.7428	0.3554
0.0653	31.09	6000	0.7447	0.3559
0.0638	32.12	6200	0.7327	0.3523
0.058	33.16	6400	0.7339	0.3488
0.0594	34.2	6600	0.7322	0.3469

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご