Open-source speech recognition model xlsr-wav2vec2-2 - Free multi-language speech-to-text conversion

Xlsr Wav2vec2 2

Developed by chrisvinsen

A fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53, supporting multilingual speech-to-text tasks

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Multilingual speech recognition #Low word error rate #XLSR pre-training

Downloads 20

Release Time : 5/25/2022

Model Overview

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53, focusing on speech recognition tasks, capable of converting speech to text

Model Features

Multilingual support

Based on XLSR-53 architecture, potentially supports speech recognition in multiple languages

Efficient fine-tuning

Fine-tuned on the base model, improving performance for specific tasks

Low word error rate

Achieved a word error rate (WER) of 0.4301 on the evaluation set

Model Capabilities

Speech recognition

Speech-to-text

Multilingual processing

Use Cases

Speech transcription

Meeting minutes

Automatically convert meeting recordings into text transcripts

Word error rate 0.4301

Voice notes

Convert voice memos into searchable text

Assistive technology

Real-time caption generation

Generate real-time captions for video or live streaming content

🚀 xlsr-wav2vec2-2

This is a fine - tuned model based on the Transformer architecture, which can achieve good performance in speech - related tasks.

🚀 Quick Start

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5884
Wer: 0.4301

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 800
num_epochs: 60
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
6.6058	1.38	400	3.1894	1.0
2.3145	2.76	800	0.7193	0.7976
0.6737	4.14	1200	0.5338	0.6056
0.4651	5.52	1600	0.5699	0.6007
0.3968	6.9	2000	0.4608	0.5221
0.3281	8.28	2400	0.5264	0.5209
0.2937	9.65	2800	0.5366	0.5096
0.2619	11.03	3200	0.4902	0.5021
0.2394	12.41	3600	0.4706	0.4908
0.2139	13.79	4000	0.5526	0.4871
0.2034	15.17	4400	0.5396	0.5108
0.1946	16.55	4800	0.4959	0.4866
0.1873	17.93	5200	0.4898	0.4877
0.1751	19.31	5600	0.5488	0.4932
0.1668	20.69	6000	0.5645	0.4986
0.1638	22.07	6400	0.5367	0.4946
0.1564	23.45	6800	0.5282	0.4898
0.1566	24.83	7200	0.5489	0.4841
0.1522	26.21	7600	0.5439	0.4821
0.1378	27.59	8000	0.5796	0.4866
0.1459	28.96	8400	0.5603	0.4875
0.1406	30.34	8800	0.6773	0.5005
0.1298	31.72	9200	0.5858	0.4827
0.1268	33.1	9600	0.6007	0.4790
0.1204	34.48	10000	0.5716	0.4734
0.113	35.86	10400	0.5866	0.4748
0.1088	37.24	10800	0.5790	0.4752
0.1074	38.62	11200	0.5966	0.4721
0.1018	40.0	11600	0.5720	0.4668
0.0968	41.38	12000	0.5826	0.4698
0.0874	42.76	12400	0.5937	0.4634
0.0843	44.14	12800	0.6056	0.4640
0.0822	45.52	13200	0.5531	0.4569
0.0806	46.9	13600	0.5669	0.4484
0.072	48.28	14000	0.5683	0.4484
0.0734	49.65	14400	0.5735	0.4437
0.0671	51.03	14800	0.5455	0.4394
0.0617	52.41	15200	0.5838	0.4365
0.0607	53.79	15600	0.6233	0.4397
0.0593	55.17	16000	0.5649	0.4340
0.0551	56.55	16400	0.5923	0.4392
0.0503	57.93	16800	0.5858	0.4325
0.0496	59.31	17200	0.5884	0.4301

Framework versions

Transformers 4.19.2
Pytorch 1.11.0+cu113
Datasets 2.2.2
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご