Wav2vec2-2 Open-source Speech Recognition Model - Free Deployment, Word Error Rate in Evaluation Set as Low as 0.8133

Wav2vec2 2

Developed by chrisvinsen

A fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a Word Error Rate (WER) of 0.8133 on the evaluation set

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Fine-tuned Model #Low-resource Optimization

Downloads 16

Release Time : 5/22/2022

Model Overview

This model is a fine-tuned version for speech recognition tasks, based on the wav2vec2 architecture, suitable for applications converting speech to text.

Model Features

Based on wav2vec2 Architecture

Uses Facebook's wav2vec2-base as the foundation model, featuring excellent speech feature extraction capabilities

Fine-tuning Optimization

Fine-tuned on specific datasets to optimize speech recognition performance

Relatively Low Word Error Rate

Achieves a Word Error Rate (WER) of 0.8133 on the evaluation set

Model Capabilities

Speech Recognition

Audio-to-Text Conversion

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Voice Notes

Convert voice memos into searchable text

Assistive Technology

Speech-to-Text Services

Provide real-time captioning services for the hearing impaired

🚀 wav2vec2-2

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It offers accurate speech - related results, with specific performance metrics on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.9253
Wer: 0.8133

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
8.4469	0.34	200	3.7440	1.0
3.1152	0.69	400	3.3755	1.0
2.9228	1.03	600	3.0427	1.0
2.8661	1.38	800	2.9406	1.0
2.8402	1.72	1000	2.9034	1.0
2.8301	2.07	1200	2.8850	1.0
2.8088	2.41	1400	2.8479	1.0
2.6892	2.75	1600	2.5800	1.0
2.3249	3.1	1800	2.1310	1.0
1.9687	3.44	2000	1.7652	0.9982
1.7338	3.79	2200	1.5430	0.9974
1.5698	4.13	2400	1.3927	0.9985
1.4475	4.48	2600	1.3186	0.9911
1.3764	4.82	2800	1.2406	0.9647
1.3022	5.16	3000	1.1954	0.9358
1.2409	5.51	3200	1.1450	0.8990
1.1989	5.85	3400	1.1107	0.8794
1.1478	6.2	3600	1.0839	0.8667
1.106	6.54	3800	1.0507	0.8573
1.0792	6.88	4000	1.0179	0.8463
1.0636	7.23	4200	0.9974	0.8355
1.0224	7.57	4400	0.9757	0.8343
1.0166	7.92	4600	0.9641	0.8261
0.9925	8.26	4800	0.9553	0.8183
0.9934	8.61	5000	0.9466	0.8199
0.9741	8.95	5200	0.9353	0.8172
0.9613	9.29	5400	0.9331	0.8133
0.9714	9.64	5600	0.9272	0.8144
0.9593	9.98	5800	0.9253	0.8133

Framework versions

Transformers 4.19.2
Pytorch 1.11.0+cu113
Datasets 2.2.2
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご