Open-source model wav2vec2-large-xls-r-300m-sat-final - Free deployment supports Santali language speech recognition

Wav2vec2 Large Xls R 300m Sat Final

Developed by DrishtiSharma

This is an automatic speech recognition model fine-tuned on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SAT dataset based on facebook/wav2vec2-xls-r-300m, supporting Santali (Ol Chiki) language.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Santali language recognition #Robust speech processing #Multi-dialect support

Downloads 28

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model specifically designed for speech-to-text tasks in Santali (Ol Chiki) language.

Model Features

Multilingual support

Specifically optimized for Santali (Ol Chiki) language

High performance

Achieved a word error rate (WER) of 34.94% and a character error rate (CER) of 13.77% on the Common Voice 8 test set

Based on large-scale pretraining

Fine-tuned from the facebook/wav2vec2-xls-r-300m model, inheriting powerful speech feature extraction capabilities

Model Capabilities

Speech recognition

Santali (Ol Chiki) language processing

Speech-to-text

Use Cases

Speech transcription

Santali speech transcription

Convert Santali language speech content into text

Achieved a word error rate of 34.94% on the test set

Voice assistant

Santali voice assistant

Provide voice interaction capabilities for Santali language users

🚀 wav2vec2-large-xls-r-300m-sat-final

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - SAT dataset. It is designed for automatic speech recognition tasks, aiming to achieve high - quality speech - to - text conversion.

✨ Features

Tags: automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, sat, robust - speech - event, model_for_talk, hf - asr - leaderboard
Datasets: mozilla - foundation/common_voice_8_0

📚 Documentation

Model Index

Property	Details
Model Name	wav2vec2 - large - xls - r - 300m - sat - final
Task	Automatic Speech Recognition
Dataset 1	Name: Common Voice 8, Type: mozilla - foundation/common_voice_8_0, Args: sat
Metrics 1	Test WER: 0.3493975903614458, Test CER: 0.13773314203730272
Dataset 2	Name: Robust Speech Event - Dev Data, Type: speech - recognition - community - v2/dev_data, Args: sat
Metrics 2	Test WER: NA, Test CER: NA

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.8012
Wer: 0.3815

Evaluation Commands

⚠️ Important Note

Santali (Ol Chiki) language not found in speech - recognition - community - v2/dev_data

Evaluate on mozilla - foundation/common_voice_8_0 with test split

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sat-final --dataset mozilla-foundation/common_voice_8_0 --config sat --split test --log_outputs

Evaluate on speech - recognition - community - v2/dev_data

python eval.py --model_id DrishtiSharma/wav2vec2-large-xls-r-300m-sat-final --dataset speech-recognition-community-v2/dev_data --config sat --split validation --chunk_length_s 10 --stride_length_s 1

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 170
num_epochs: 200
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
10.6317	33.29	100	2.8629	1.0
2.047	66.57	200	0.9516	0.5703
0.4475	99.86	300	0.8539	0.3896
0.0716	133.29	400	0.8277	0.3454
0.047	166.57	500	0.7597	0.3655
0.0249	199.86	600	0.8012	0.3815

Framework versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご