# wav2vec2-base-timit-demo-google-colab Open-source Speech Recognition Model

Wav2vec2 Base Timit Demo Google Colab

Developed by atgarcia

A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model, suitable for English speech-to-text tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech recognition optimization #Low word error rate #TIMIT dataset

Downloads 19

Release Time : 5/17/2022

Model Overview

This model is a fine-tuned version of wav2vec2-base, specifically designed for English speech recognition tasks, demonstrating excellent performance on the TIMIT dataset.

Model Features

Efficient fine-tuning

Fine-tuned based on the pre-trained wav2vec2-base model, significantly improving recognition accuracy on the TIMIT dataset.

Low word error rate

Achieves a word error rate (WER) of 0.333 on the evaluation set, demonstrating excellent performance.

Lightweight

Based on the wav2vec2-base architecture, the model size is moderate and suitable for deployment in resource-limited environments.

Model Capabilities

English speech recognition

Real-time speech-to-text

High-accuracy transcription

Use Cases

Speech transcription

Meeting minutes

Automatically transcribe English meeting recordings into text

Achieves an accuracy rate of 66.7% (WER=0.333)

Voice assistant

Serves as the foundational recognition engine for voice assistants

Education

Pronunciation assessment

Used to evaluate the pronunciation accuracy of English learners

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5255
Wer: 0.3330

🚀 Quick Start

This section could introduce how to quickly start using this model, but relevant content is not provided in the original document.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5942	1.0	500	2.3849	1.0011
0.9765	2.01	1000	0.5907	0.5202
0.4424	3.01	1500	0.4547	0.4661
0.3008	4.02	2000	0.4194	0.4228
0.2316	5.02	2500	0.3933	0.4099
0.1921	6.02	3000	0.4532	0.3965
0.1561	7.03	3500	0.4315	0.3777
0.1378	8.03	4000	0.4463	0.3847
0.1222	9.04	4500	0.4402	0.3784
0.1076	10.04	5000	0.4253	0.3735
0.0924	11.04	5500	0.4844	0.3732
0.0866	12.05	6000	0.4758	0.3646
0.086	13.05	6500	0.6395	0.4594
0.0763	14.06	7000	0.4951	0.3647
0.0684	15.06	7500	0.4870	0.3577
0.0616	16.06	8000	0.5442	0.3591
0.0594	17.07	8500	0.5305	0.3606
0.0613	18.07	9000	0.5434	0.3546
0.0473	19.08	9500	0.4818	0.3532
0.0463	20.08	10000	0.5086	0.3514
0.042	21.08	10500	0.5017	0.3484
0.0365	22.09	11000	0.5129	0.3536
0.0336	23.09	11500	0.5411	0.3433
0.0325	24.1	12000	0.5307	0.3424
0.0282	25.1	12500	0.5261	0.3404
0.0245	26.1	13000	0.5306	0.3388
0.0257	27.11	13500	0.5242	0.3369
0.0234	28.11	14000	0.5216	0.3359
0.0221	29.12	14500	0.5255	0.3330

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご