Open-source Speech Recognition Model wav2vec2-base-timit-demo-google-colab - Achieve Precise Speech Recognition for Free

Wav2vec2 Base Timit Demo Google Colab

Developed by mikeluck

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, achieving a word error rate (WER) of 0.3384 on the evaluation set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 38

Release Time : 6/15/2022

Model Overview

This is a model for English speech recognition, fine-tuned based on the wav2vec2 architecture, suitable for tasks converting English speech to text.

Model Features

Low Word Error Rate

Achieved a word error rate (WER) of 0.3384 on the TIMIT dataset, demonstrating good performance.

Based on wav2vec2 Architecture

Uses facebook's wav2vec2-base as the base model, featuring powerful speech feature extraction capabilities.

Lightweight Model

The base version is relatively lightweight, suitable for deployment in resource-constrained environments.

Model Capabilities

English Speech Recognition

Speech-to-Text

Audio Content Transcription

Use Cases

Speech Transcription

Automatic Meeting Transcription

Automatically converts English meeting recordings into text transcripts.

Accuracy approximately 66.16% (1-WER)

Voice Note Conversion

Converts personal voice notes into searchable text.

Assistive Technology

Real-time Caption Generation

Generates real-time captions for English videos or live streams.

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It offers valuable results in speech - related tasks, achieving specific performance metrics on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5351
Wer: 0.3384

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.6311	1.0	500	2.6700	1.0
1.0104	2.01	1000	0.5289	0.5277
0.4483	3.01	1500	0.4576	0.4623
0.3089	4.02	2000	0.4483	0.4255
0.2278	5.02	2500	0.4463	0.4022
0.1886	6.02	3000	0.4653	0.3938
0.1578	7.03	3500	0.4624	0.3855
0.1429	8.03	4000	0.4420	0.3854
0.1244	9.04	4500	0.4980	0.3787
0.1126	10.04	5000	0.4311	0.3785
0.1082	11.04	5500	0.5114	0.3782
0.0888	12.05	6000	0.5392	0.3725
0.0835	13.05	6500	0.6011	0.3941
0.074	14.06	7000	0.5030	0.3652
0.0667	15.06	7500	0.5041	0.3583
0.0595	16.06	8000	0.5125	0.3605
0.0578	17.07	8500	0.5206	0.3592
0.0573	18.07	9000	0.5208	0.3643
0.0469	19.08	9500	0.4670	0.3537
0.0442	20.08	10000	0.5388	0.3497
0.0417	21.08	10500	0.5213	0.3581
0.0361	22.09	11000	0.5096	0.3465
0.0338	23.09	11500	0.5178	0.3459
0.0333	24.1	12000	0.5240	0.3490
0.0256	25.1	12500	0.5438	0.3464
0.0248	26.1	13000	0.5182	0.3412
0.0231	27.11	13500	0.5628	0.3423
0.0228	28.11	14000	0.5416	0.3419
0.0223	29.12	14500	0.5351	0.3384

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご