wav2vec2-large-xls-r-300m-assamese-cv8 Open Source Model - Free Automatic Speech Recognition for Assamese

Wav2vec2 Large Xls R 300m Assamese Cv8

Developed by infinitejoy

This is an automatic speech recognition (ASR) model fine-tuned on Assamese datasets based on the facebook/wav2vec2-xls-r-300m model

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Assamese speech recognition #Multi-dialect support #Low-resource optimization

Downloads 18

Release Time : 3/2/2022

Model Overview

This model is a fine-tuned version on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - AS dataset, specifically designed for Assamese speech recognition tasks

Model Features

Assamese-specific

Speech recognition model specifically optimized for Assamese

Based on XLS-R architecture

Uses Facebook's XLS-R-300M large-scale pre-trained model as the foundation

Fine-tuned on Common Voice dataset

Fine-tuned using the Assamese dataset from Mozilla Common Voice 8.0

Model Capabilities

Assamese speech recognition

Speech-to-text

Conversational speech processing

Use Cases

Speech transcription

Assamese speech transcription

Convert Assamese speech content into text

Achieves WER of 65.966 and CER of 22.188 on test set

Voice assistant

Assamese voice interaction

Supports Assamese voice command recognition

🚀 wav2vec2-large-xls-r-300m-assamese-cv8

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - AS dataset, aiming to provide high - quality automatic speech recognition for Assamese.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - AS dataset. It achieves the following results on the evaluation set:

Loss: 0.9814
Wer: 0.7402

✨ Features

Fine - tuned: Based on the pre - trained model facebook/wav2vec2-xls-r-300m, fine - tuned on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - AS dataset.
Evaluation Results: Achieved a loss of 0.9814 and a Wer of 0.7402 on the evaluation set.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	20.0	400	3.1447	1.0
No log	40.0	800	1.0074	0.8556
3.1278	60.0	1200	0.9507	0.7711
3.1278	80.0	1600	0.9730	0.7630
0.8247	100.0	2000	0.9814	0.7402

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is released under the Apache - 2.0 license.

📦 Information Table

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-300m
Training Data	MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - AS dataset
Evaluation Results on Test Set	Loss: 0.9814, Wer: 0.7402
Model Index Name	XLS - R - 300M - Assamese
Task	Automatic Speech Recognition
Dataset	MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - AS
Test WER	65.966
Test CER	22.188

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご