The open-source xlsr - wav2vec2 - 1 speech recognition model - Free support for multilingual speech-to-text tasks

Xlsr Wav2vec2 1

Developed by chrisvinsen

A speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting multilingual speech-to-text tasks

Downloads 20

Release Time : 5/24/2022

Model Overview

This model is a fine-tuned version of wav2vec2-large-xlsr-53, focusing on speech recognition tasks, capable of converting speech to text

Multilingual Support

Based on XLSR architecture, potentially supporting speech recognition in multiple languages

Efficient Training

Uses mixed-precision training and gradient accumulation techniques to improve training efficiency

Continuous Optimization

After 30 training epochs, word error rate decreased from 1.0 to 0.4412

Speech-to-text

Multilingual speech recognition

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Word error rate 0.4412

Voice Assistant

Serve as the speech recognition component for voice assistants

Training Loss	Epoch	Step	Validation Loss	Wer
5.517	1.38	400	3.0431	1.0
1.8387	2.76	800	0.6552	0.7263
0.5971	4.14	1200	0.5308	0.5885
0.4153	5.52	1600	0.4667	0.5551
0.3388	6.9	2000	0.4428	0.5260
0.2803	8.28	2400	0.4915	0.5164
0.2613	9.65	2800	0.4904	0.4988
0.237	11.03	3200	0.4998	0.5075
0.2175	12.41	3600	0.4905	0.4983
0.1969	13.79	4000	0.4818	0.4877
0.1932	15.17	4400	0.5578	0.5006
0.1782	16.55	4800	0.4981	0.4949
0.1655	17.93	5200	0.4978	0.4940
0.1505	19.31	5600	0.5360	0.4896
0.1362	20.69	6000	0.5441	0.4709
0.1246	22.07	6400	0.5358	0.4650
0.1117	23.45	6800	0.5513	0.4716
0.107	24.83	7200	0.5344	0.4578
0.0963	26.21	7600	0.5073	0.4452
0.0846	27.59	8000	0.5335	0.4497
0.0799	28.96	8400	0.5437	0.4412

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base