wav2vec2-large-xls-r-300m-as-v9 Open Source Model - Supports Automatic Speech Recognition for Assamese

Wav2vec2 Large Xls R 300m As V9

Developed by DrishtiSharma

An automatic speech recognition model fine-tuned on the Assamese (Common Voice 8.0) dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Assamese speech recognition #Low-resource language processing #XLS-R architecture optimization

Downloads 20

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model for Assamese, fine-tuned from a large-scale pre-trained wav2vec2 model, suitable for speech-to-text tasks.

Model Features

Assamese optimization

Specially fine-tuned for Assamese, with good recognition performance in this language

Large-scale pre-training foundation

Based on the facebook/wav2vec2-xls-r-300m pre-trained model, with powerful speech feature extraction capabilities

Multi-scenario adaptation

Trained on the Common Voice dataset, capable of adapting to various speech scenarios

Model Capabilities

Assamese speech recognition

Speech-to-text

Automatic speech transcription

Use Cases

Speech transcription

Assamese speech transcription

Convert Assamese speech content into text

61.64% WER on Common Voice 8.0 test set

Voice assistant

Assamese voice interaction

Supports Assamese voice command recognition

🚀 wav2vec2-large-xls-r-300m-as-v9

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It is designed for automatic speech recognition, offering high - quality performance in processing speech data.

✨ Features

Multilingual Adaptability: Trained on datasets like mozilla - foundation/common_voice_8_0, suitable for multiple language scenarios.
High - performance Metrics: Achieved good results in evaluation metrics such as WER and CER.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Evaluation Command

To evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-as-v9 --dataset mozilla-foundation/common_voice_8_0 --config as --split test --log_outputs

To evaluate on speech - recognition - community - v2/dev_data Assamese (as) language isn't available in speech - recognition - community - v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	0.000111
train_batch_size	16
eval_batch_size	8
seed	42
gradient_accumulation_steps	2
total_train_batch_size	32
optimizer	Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_steps	300
num_epochs	200
mixed_precision_training	Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
8.3852	10.51	200	3.6402	1.0
3.5374	21.05	400	3.3894	1.0
2.8645	31.56	600	1.3143	0.8303
1.1784	42.1	800	0.9417	0.6661
0.7805	52.62	1000	0.9292	0.6237
0.5973	63.15	1200	0.9489	0.6014
0.4784	73.67	1400	0.9916	0.5962
0.4138	84.21	1600	1.0272	0.6121
0.3491	94.72	1800	1.0412	0.5984
0.3062	105.26	2000	1.0769	0.6005
0.2707	115.77	2200	1.0708	0.5752
0.2459	126.31	2400	1.1285	0.6009
0.2234	136.82	2600	1.1209	0.5949
0.2035	147.36	2800	1.1348	0.5842
0.1876	157.87	3000	1.1480	0.5872
0.1669	168.41	3200	1.1496	0.5838
0.1595	178.92	3400	1.1721	0.5778
0.1505	189.46	3600	1.1654	0.5744
0.1486	199.97	3800	1.1679	0.5761

Framework versions

Property	Details
Transformers	4.16.1
Pytorch	1.10.0+cu111
Datasets	1.18.2
Tokenizers	0.11.0

🔧 Technical Details

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. During the training process, specific hyperparameters were used, and it achieved certain results in the evaluation set, including loss, WER, etc.

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご