wav2vec2-large-xls-r-300m-assamese Open-source Model - Free Automatic Speech Recognition for Assamese

Wav2vec2 Large Xls R 300m Assamese

Developed by infinitejoy

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_7_0 dataset, designed for Assamese automatic speech recognition tasks.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Assamese speech recognition #XLS-R fine-tuning #Low-resource language processing

Downloads 13

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition model for Assamese, fine-tuned based on the XLS-R-300M architecture, suitable for speech-to-text tasks in Assamese.

Model Features

Assamese Optimization

Specially fine-tuned for Assamese, improving recognition accuracy for this language.

Based on XLS-R-300M

Uses the powerful XLS-R-300M architecture as the base model.

Trained on Common Voice Dataset

Trained on the mozilla-foundation/common_voice_7_0 dataset.

Model Capabilities

Assamese speech recognition

Audio-to-text conversion

Use Cases

Speech Transcription

Assamese Speech-to-Text

Convert Assamese speech content into text

WER: 72.64, CER: 27.35

🚀 wav2vec2-large-xls-r-300m-assamese

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_7_0 dataset. It offers significant value in the field of automatic speech recognition, providing a reliable solution for transcribing Assamese audio.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_7_0 dataset. It achieves the following results on the evaluation set:

WER: 0.7954545454545454
CER: 0.32341269841269843

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

To compute the evaluation parameters

cd wav2vec2-large-xls-r-300m-assamese; python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config as --split test --log_outputs

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 4
train_batch_size: 16
eval_batch_size: 8
seed: not given
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 400
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.584065	NA	400	1.584065	0.915512
1.658865	Na	800	1.658865	0.805096
1.882352	NA	1200	1.882352	0.820742
1.881240	NA	1600	1.881240	0.810907
2.159748	NA	2000	2.159748	0.804202
1.992871	NA	2400	1.992871	0.803308
2.201436	NA	2800	2.201436	0.802861
2.165218	NA	3200	2.165218	0.793920
2.253643	NA	3600	2.253643	0.796603
2.265880	NA	4000	2.265880	0.790344
2.293935	NA	4400	2.293935	0.797050
2.288851	NA	4800	2.288851	0.784086

Framework versions

Transformers 4.11.3
Pytorch 1.10.0+cu113
Datasets 1.13.3
Tokenizers 0.10.3

📄 License

This project is licensed under the Apache - 2.0 license.

📊 Model Index

Property	Details
Model Type	XLS - R - 300M - Assamese
Training Data	mozilla - foundation/common_voice_7_0
Task	Automatic Speech Recognition
Dataset Name	Common Voice 7
Dataset Type	mozilla - foundation/common_voice_7_0
Dataset Args	as
Test WER	72.64
Test CER	27.35

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご