wav2vec2-base-timit-demo-google-colab Open-source Speech Recognition Model

Wav2vec2 Base Timit Demo Google Colab

Developed by wrice

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, focusing on English speech-to-text tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 17

Release Time : 5/25/2022

Model Overview

This is a wav2vec2 model optimized for English speech recognition tasks, demonstrating excellent performance after fine-tuning on the TIMIT dataset with a word error rate (WER) of 0.3204.

Model Features

Efficient Speech Recognition

After fine-tuning on the TIMIT dataset, it achieves a word error rate (WER) of 0.3204, demonstrating excellent performance.

Based on wav2vec2 Architecture

Utilizes facebook's wav2vec2-base as the base model, featuring powerful speech feature extraction capabilities.

Lightweight Deployment

The base version of the model is suitable for deployment in resource-constrained environments.

Model Capabilities

English Speech Recognition

Speech-to-Text

Audio Content Analysis

Use Cases

Speech Transcription

Automated Meeting Minutes

Automatically convert English meeting recordings into text transcripts

Accuracy rate of 67.96% (WER=0.3204)

Voice Assistant

Used for English voice command recognition

Education

Pronunciation Assessment

Help English learners evaluate pronunciation accuracy

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It offers valuable performance metrics on the evaluation set, which can be used for speech - related tasks.

🚀 Quick Start

This section provides an overview of the model and its performance on the evaluation set.

This model achieves the following results on the evaluation set:

Loss: 0.6348
Wer: 0.3204

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.2767	0.5	500	2.9921	1.0
1.509	1.01	1000	0.8223	0.6031
0.7226	1.51	1500	0.6185	0.4935
0.5777	2.01	2000	0.5600	0.4569
0.4306	2.51	2500	0.4985	0.4229
0.3854	3.02	3000	0.5113	0.4200
0.3161	3.52	3500	0.5197	0.4042
0.2904	4.02	4000	0.4900	0.3936
0.2404	4.52	4500	0.5209	0.3797
0.2546	5.03	5000	0.4836	0.3855
0.2278	5.53	5500	0.5194	0.3676
0.2049	6.03	6000	0.5647	0.4042
0.199	6.53	6500	0.5699	0.3932
0.1932	7.04	7000	0.5498	0.3694
0.1633	7.54	7500	0.5918	0.3686
0.1674	8.04	8000	0.5298	0.3716
0.1496	8.54	8500	0.5788	0.3726
0.1488	9.05	9000	0.5603	0.3664
0.1286	9.55	9500	0.5427	0.3550
0.1364	10.05	10000	0.5794	0.3621
0.1177	10.55	10500	0.5587	0.3606
0.1126	11.06	11000	0.5788	0.3519
0.1272	11.56	11500	0.5859	0.3595
0.1414	12.06	12000	0.5852	0.3586
0.1081	12.56	12500	0.5653	0.3727
0.1073	13.07	13000	0.5653	0.3526
0.0922	13.57	13500	0.5758	0.3583
0.09	14.07	14000	0.5990	0.3599
0.0987	14.57	14500	0.5837	0.3516
0.0823	15.08	15000	0.5639	0.3454
0.0752	15.58	15500	0.5663	0.3542
0.0714	16.08	16000	0.6273	0.3419
0.0693	16.58	16500	0.6389	0.3441
0.0634	17.09	17000	0.6006	0.3409
0.063	17.59	17500	0.6456	0.3444
0.0627	18.09	18000	0.6706	0.3458
0.0519	18.59	18500	0.6370	0.3396
0.059	19.1	19000	0.6602	0.3390
0.0495	19.6	19500	0.6642	0.3364
0.0601	20.1	20000	0.6495	0.3408
0.07	20.6	20500	0.6526	0.3476
0.0517	21.11	21000	0.6265	0.3401
0.0434	21.61	21500	0.6364	0.3372
0.0383	22.11	22000	0.6742	0.3377
0.0372	22.61	22500	0.6499	0.3330
0.0329	23.12	23000	0.6877	0.3307
0.0366	23.62	23500	0.6351	0.3303
0.0372	24.12	24000	0.6547	0.3286
0.031	24.62	24500	0.6757	0.3304
0.0367	25.13	25000	0.6507	0.3312
0.0309	25.63	25500	0.6645	0.3298
0.03	26.13	26000	0.6342	0.3325
0.0274	26.63	26500	0.6614	0.3255
0.0236	27.14	27000	0.6614	0.3222
0.0263	27.64	27500	0.6560	0.3242
0.0264	28.14	28000	0.6337	0.3237
0.0234	28.64	28500	0.6322	0.3208
0.0249	29.15	29000	0.6367	0.3218
0.0252	29.65	29500	0.6348	0.3204

Framework versions

Transformers 4.19.2
Pytorch 1.8.2+cu111
Datasets 1.17.0
Tokenizers 0.11.6

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご