wav2vec2-base-timit-demo-google-colab Open-source Speech Recognition Model

Wav2vec2 Base Timit Demo Google Colab

Developed by neweasterns

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, primarily used for English speech-to-text tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 100

Release Time : 6/27/2022

Model Overview

A speech recognition model based on the wav2vec2 architecture, fine-tuned on the TIMIT dataset, capable of converting English speech into text.

Model Features

Efficient Fine-tuning

Fine-tuned based on the pre-trained wav2vec2-base model, significantly improving recognition accuracy on the TIMIT dataset.

Low Word Error Rate

After 30 training epochs, the Word Error Rate (WER) dropped to 0.3388, outperforming the base model.

Optimized Training

Utilizes the Adam optimizer and linear learning rate scheduler with 1000 warm-up steps to ensure training stability.

Model Capabilities

English Speech Recognition

Speech-to-Text

Automatic Speech Recognition

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert English meeting recordings into text transcripts.

Word Error Rate around 34%

Voice Command Recognition

Recognize English voice commands and convert them into executable commands.

Education

Pronunciation Assessment

Used for evaluating the pronunciation accuracy of English learners.

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves specific results on the evaluation set, providing a certain level of performance for related tasks.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5206
Wer: 0.3388

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5597	1.0	500	2.3415	0.9991
0.9759	2.01	1000	0.5556	0.5382
0.4587	3.01	1500	0.7690	0.4781
0.3156	4.02	2000	0.7994	0.4412
0.2272	5.02	2500	0.8948	0.4120
0.1921	6.02	3000	0.7065	0.3940
0.1618	7.03	3500	0.4333	0.3855
0.1483	8.03	4000	0.4232	0.3872
0.156	9.04	4500	0.4172	0.3749
0.1138	10.04	5000	0.4084	0.3758
0.1045	11.04	5500	0.4665	0.3623
0.0908	12.05	6000	0.4416	0.3684
0.0788	13.05	6500	0.4801	0.3659
0.0773	14.06	7000	0.4560	0.3583
0.0684	15.06	7500	0.4878	0.3610
0.0645	16.06	8000	0.4635	0.3567
0.0577	17.07	8500	0.5245	0.3548
0.0547	18.07	9000	0.5265	0.3639
0.0466	19.08	9500	0.5161	0.3546
0.0432	20.08	10000	0.5263	0.3558
0.0414	21.08	10500	0.4874	0.3500
0.0365	22.09	11000	0.5266	0.3472
0.0321	23.09	11500	0.5422	0.3458
0.0325	24.1	12000	0.5201	0.3428
0.0262	25.1	12500	0.5208	0.3398
0.0249	26.1	13000	0.5034	0.3429
0.0262	27.11	13500	0.5055	0.3396
0.0248	28.11	14000	0.5164	0.3404
0.0222	29.12	14500	0.5206	0.3388

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご