wav2vec2-base-timit-demo-google-colab Open-source Speech Recognition Model - Achieve Precise Speech-to-Text Conversion for Free

Wav2vec2 Base Timit Demo Google Colab

Developed by pannaga

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base and trained in the Google Colab environment.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech recognition #TIMIT dataset #Low word error rate

Downloads 16

Release Time : 6/30/2022

Model Overview

A fine-tuned model for English speech recognition, based on the wav2vec2 architecture, suitable for speech-to-text tasks.

Model Features

Efficient fine-tuning

Fine-tuning on the TIMIT dataset significantly improves the speech recognition performance of the original wav2vec2-base model

Google Colab compatibility

The model training process is optimized for the Google Colab environment, facilitating rapid deployment and experimentation

Relatively lightweight

Based on the wav2vec2-base architecture, it is more suitable for environments with limited resources compared to larger models

Model Capabilities

English speech recognition

Speech-to-text

Audio feature extraction

Use Cases

Speech processing

Speech transcription

Convert English speech content into text

The word error rate (WER) is 0.3437

Speech command recognition

Recognize simple speech commands and instructions

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base, aiming to achieve better performance on specific speech - related tasks.

🚀 Quick Start

This model is a fine-tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5480
Wer: 0.3437

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5237	1.0	500	1.7277	0.9752
0.8339	2.01	1000	0.5413	0.5316
0.4277	3.01	1500	0.4732	0.4754
0.2907	4.02	2000	0.4571	0.4476
0.2254	5.02	2500	0.4611	0.4105
0.1911	6.02	3000	0.4448	0.4072
0.1595	7.03	3500	0.4517	0.3843
0.1377	8.03	4000	0.4551	0.3881
0.1197	9.04	4500	0.4853	0.3772
0.1049	10.04	5000	0.4617	0.3707
0.097	11.04	5500	0.4633	0.3622
0.0872	12.05	6000	0.4635	0.3690
0.0797	13.05	6500	0.5196	0.3749
0.0731	14.06	7000	0.5029	0.3639
0.0667	15.06	7500	0.5053	0.3614
0.0618	16.06	8000	0.5627	0.3638
0.0562	17.07	8500	0.5484	0.3577
0.0567	18.07	9000	0.5163	0.3560
0.0452	19.08	9500	0.5012	0.3538
0.044	20.08	10000	0.4931	0.3534
0.0424	21.08	10500	0.5147	0.3519
0.0356	22.09	11000	0.5540	0.3521
0.0322	23.09	11500	0.5565	0.3509
0.0333	24.1	12000	0.5315	0.3428
0.0281	25.1	12500	0.5284	0.3425
0.0261	26.1	13000	0.5101	0.3446
0.0256	27.11	13500	0.5432	0.3415
0.0229	28.11	14000	0.5484	0.3446
0.0212	29.12	14500	0.5480	0.3437

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご