Wav2vec2-10 Open-source Speech Recognition Model - Fine-tuning Optimization, Precise Speech Recognition with Low Error Rate

Wav2vec2 10

Developed by chrisvinsen

A speech recognition model fine-tuned from facebook/wav2vec2-base, achieving a Word Error Rate (WER) of 1.0 on the evaluation set

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Fine-tuned Model #Low Word Error Rate

Downloads 20

Release Time : 5/23/2022

Model Overview

This model is a speech recognition model based on the wav2vec2 architecture, fine-tuned for the task of converting speech to text

Model Features

Low Word Error Rate

Achieves a Word Error Rate (WER) of 1.0 on the evaluation set

Based on wav2vec2 Architecture

Uses facebook/wav2vec2-base as the base model for fine-tuning

Optimized Training

Trained for 30 epochs using a linear learning rate scheduler and Adam optimizer

Model Capabilities

Speech Recognition

Audio-to-Text

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Word Error Rate 1.0

Voice Notes

Convert voice memos into searchable text

🚀 wav2vec2-10

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It provides a more customized solution for specific tasks based on the pre - trained model. On the evaluation set, it achieves the following results:

Loss: 3.0354
Wer: 1.0

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.2231	0.78	200	3.0442	1.0
2.8665	1.57	400	3.0081	1.0
2.8596	2.35	600	3.0905	1.0
2.865	3.14	800	3.0443	1.0
2.8613	3.92	1000	3.0316	1.0
2.8601	4.71	1200	3.0574	1.0
2.8554	5.49	1400	3.0261	1.0
2.8592	6.27	1600	3.0785	1.0
2.8606	7.06	1800	3.1129	1.0
2.8547	7.84	2000	3.0647	1.0
2.8565	8.63	2200	3.0624	1.0
2.8633	9.41	2400	2.9900	1.0
2.855	10.2	2600	3.0084	1.0
2.8581	10.98	2800	3.0092	1.0
2.8545	11.76	3000	3.0299	1.0
2.8583	12.55	3200	3.0293	1.0
2.8536	13.33	3400	3.0566	1.0
2.8556	14.12	3600	3.0385	1.0
2.8573	14.9	3800	3.0098	1.0
2.8551	15.69	4000	3.0623	1.0
2.8546	16.47	4200	3.0964	1.0
2.8569	17.25	4400	3.0648	1.0
2.8543	18.04	4600	3.0377	1.0
2.8532	18.82	4800	3.0454	1.0
2.8579	19.61	5000	3.0301	1.0
2.8532	20.39	5200	3.0364	1.0
2.852	21.18	5400	3.0187	1.0
2.8561	21.96	5600	3.0172	1.0
2.8509	22.75	5800	3.0420	1.0
2.8551	23.53	6000	3.0309	1.0
2.8552	24.31	6200	3.0416	1.0
2.8521	25.1	6400	3.0469	1.0
2.852	25.88	6600	3.0489	1.0
2.854	26.67	6800	3.0394	1.0
2.8572	27.45	7000	3.0336	1.0
2.8502	28.24	7200	3.0363	1.0
2.8557	29.02	7400	3.0304	1.0
2.8522	29.8	7600	3.0354	1.0

Framework versions

Transformers 4.19.2
Pytorch 1.11.0+cu113
Datasets 2.2.2
Tokenizers 0.12.1

📄 License

The model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご