wav2vec2-large-xls-r-300m-sat-a3 Open Source Model - Precise Automatic Speech Recognition for Santali

Wav2vec2 Large Xls R 300m Sat A3

Developed by DrishtiSharma

An automatic speech recognition (ASR) model fine-tuned on the Santali (Ol Chiki) speech dataset based on Facebook's wav2vec2-xls-r-300m model

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Santali speech recognition #Low word error rate #Multi-scenario adaptation

Downloads 29

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition system optimized for Santali, trained on the Common Voice 8 dataset, supporting the conversion of Santali speech to text

Model Features

Santali optimization

Specifically fine-tuned for Santali (Ol Chiki) to provide better speech recognition accuracy

Based on XLS-R architecture

Uses Facebook's powerful wav2vec2-XLS-R-300m as the base model

Multi-scenario applicability

Performs well on the Common Voice dataset and is suitable for various speech recognition scenarios

Model Capabilities

Santali speech recognition

Speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Santali speech transcription

Convert Santali speech content to text

Achieves WER of 0.357 and CER of 0.142 on the Common Voice 8 test set

Voice assistant

Santali voice command recognition

Used for Santali voice assistants or interactive systems

🚀 wav2vec2-large-xls-r-300m-sat-a3

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SAT dataset. It is mainly used for automatic speech recognition, aiming to provide high - quality speech - to - text conversion and perform well in related tasks.

✨ Features

Fine - tuned Model: Based on the pre - trained model [facebook/wav2vec2-xls-r-300m], it is fine - tuned on the SAT dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0 to adapt to specific speech recognition tasks.
Multi - metric Evaluation: Evaluated using multiple metrics such as WER (Word Error Rate) and CER (Character Error Rate) to comprehensively measure the performance of the model.

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2-large-xls-r-300m-sat-a3
Training Data	mozilla - foundation/common_voice_8_0
License	apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, sat, robust - speech - event, model_for_talk, hf - asr - leaderboard

Evaluation Results

The model achieves the following results on the evaluation set:

Loss: 0.8961
Wer: 0.3976

The detailed evaluation results on different datasets are as follows:

Task	Dataset	Test WER	Test CER
Automatic Speech Recognition	Common Voice 8 (mozilla - foundation/common_voice_8_0, args: sat)	0.357429718875502	0.14203730272596843
Automatic Speech Recognition	Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: sat)	NA	NA

Evaluation Commands

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sat-a3 --dataset mozilla-foundation/common_voice_8_0 --config sat --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data Note: Santali (Ol Chiki) language not found in speech - recognition - community - v2/dev_data

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 200
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
11.1266	33.29	100	2.8577	1.0
2.1549	66.57	200	1.0799	0.5542
0.5628	99.86	300	0.7973	0.4016
0.0779	133.29	400	0.8424	0.4177
0.0404	166.57	500	0.9048	0.4137
0.0212	199.86	600	0.8961	0.3976

Framework Versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご