Open-source speech recognition model wav2vec2-xls-r-tf-left-right-shuru - Accurately recognize speech content with a low error rate

Wav2vec2 Xls R Tf Left Right Shuru

Developed by hrdipto

A speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, achieving a word error rate (WER) of 1.2628 on the evaluation set.

Downloads 29

Release Time : 3/2/2022

Model Overview

This is a speech recognition model fine-tuned based on the wav2vec2-xls-r-300m architecture, suitable for speech-to-text tasks.

Low Word Error Rate

Achieved a word error rate (WER) of 1.2628 on the evaluation set, demonstrating excellent performance.

Based on wav2vec2-xls-r Architecture

Utilizes facebook's wav2vec2-xls-r-300m as the base model, featuring powerful speech feature extraction capabilities.

Mixed Precision Training

Employs native AMP for mixed precision training, improving training efficiency.

Speech Recognition

Speech-to-Text

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Word error rate 1.2628

Voice Notes

Convert voice notes into editable text

Training Loss	Epoch	Step	Validation Loss	Wer
6.5528	23.81	500	0.5509	1.9487
0.2926	47.62	1000	0.1306	1.2756
0.1171	71.43	1500	0.1189	1.2628
0.0681	95.24	2000	0.0921	1.2628

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base