Open-source Speech Recognition Model wav2vec2-base-timit-google-colab - Accurately Identify Speech Content with a Low Error Rate

Wav2vec2 Base Timit Google Colab

Developed by anithapappu

A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, achieving a word error rate (WER) of 0.3355 on the evaluation set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 19

Release Time : 5/23/2022

Model Overview

This model is a fine-tuned version of wav2vec2-base, primarily designed for English speech recognition tasks.

Model Features

Low Word Error Rate

Achieved a word error rate (WER) of 0.3355 on the evaluation set, demonstrating strong performance.

Based on wav2vec2 Architecture

Utilizes facebook/wav2vec2-base as the base model, featuring robust speech feature extraction capabilities.

Fine-tuning Optimization

Optimized for specific tasks through 30 epochs of fine-tuning training.

Model Capabilities

English Speech Recognition

Audio to Text Conversion

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert English meeting recordings into text transcripts

Approximately 66.45% accuracy (WER=0.3355)

Voice Notes

Convert English voice notes into searchable text

🚀 wav2vec2-base-timit-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5506
Wer: 0.3355

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. You can use it for relevant speech - related tasks based on the evaluation results.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-base on the None dataset
Evaluation Results	Loss: 0.5506; Wer: 0.3355

Training and Evaluation

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
3.4326	1.0	500	1.5832	1.0063
0.8235	2.01	1000	0.5310	0.5134
0.4224	3.01	1500	0.4488	0.4461
0.2978	4.02	2000	0.4243	0.4191
0.232	5.02	2500	0.4532	0.4149
0.1902	6.02	3000	0.4732	0.3912
0.1628	7.03	3500	0.4807	0.3868
0.1437	8.03	4000	0.5295	0.3670
0.1241	9.04	4500	0.4602	0.3810
0.1206	10.04	5000	0.4691	0.3783
0.0984	11.04	5500	0.4500	0.3710
0.0929	12.05	6000	0.5247	0.3550
0.0914	13.05	6500	0.5546	0.3821
0.0742	14.06	7000	0.4874	0.3646
0.0729	15.06	7500	0.5327	0.3934
0.0663	16.06	8000	0.5769	0.3661
0.0575	17.07	8500	0.5191	0.3524
0.0588	18.07	9000	0.5155	0.3360
0.0456	19.08	9500	0.5135	0.3539
0.0444	20.08	10000	0.5380	0.3603
0.0419	21.08	10500	0.5275	0.3467
0.0366	22.09	11000	0.5072	0.3487
0.0331	23.09	11500	0.5450	0.3437
0.0345	24.1	12000	0.5138	0.3431
0.029	25.1	12500	0.5067	0.3413
0.0274	26.1	13000	0.5421	0.3422
0.0243	27.11	13500	0.5456	0.3392
0.0226	28.11	14000	0.5665	0.3368
0.0216	29.12	14500	0.5506	0.3355

Framework Versions

Transformers 4.20.0
Pytorch 1.11.0+cu113
Datasets 1.13.3
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご