Wav2vec2-base-timit-demo-google-colab Open-source Speech Recognition Model

Wav2vec2 Base Timit Demo Google Colab

Developed by patrickvonplaten

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, primarily used for English speech-to-text tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 26

Release Time : 5/10/2022

Model Overview

This is a speech recognition model based on the wav2vec2 architecture, fine-tuned on the TIMIT dataset, capable of converting English speech into text.

Model Features

Based on wav2vec2 Architecture

Utilizes Facebook's wav2vec2-base architecture, which has excellent speech feature extraction capabilities.

Fine-tuned on TIMIT Dataset

Fine-tuned on the standard TIMIT speech dataset, optimizing English speech recognition performance.

Relatively Low Word Error Rate

Achieves a word error rate (WER) of 0.337 on the evaluation set.

Model Capabilities

English Speech Recognition

Speech-to-Text

Use Cases

Speech Transcription

English Speech Transcription

Convert English speech content into text

Word error rate 0.337

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It provides valuable results on the evaluation set, which can be used for speech - related tasks.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5185
Wer: 0.3370

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5137	1.0	500	1.6719	0.9580
0.8324	2.01	1000	0.5546	0.5341
0.4365	3.01	1500	0.4567	0.4635
0.3058	4.02	2000	0.4429	0.4454
0.2284	5.02	2500	0.4734	0.4186
0.1892	6.02	3000	0.4191	0.4030
0.1542	7.03	3500	0.4522	0.3985
0.1364	8.03	4000	0.4749	0.3922
0.1239	9.04	4500	0.4950	0.3977
0.1092	10.04	5000	0.4468	0.3779
0.0956	11.04	5500	0.4897	0.3789
0.0897	12.05	6000	0.4927	0.3718
0.0792	13.05	6500	0.5242	0.3699
0.0731	14.06	7000	0.5202	0.3772
0.0681	15.06	7500	0.5046	0.3637
0.062	16.06	8000	0.5336	0.3664
0.0556	17.07	8500	0.5017	0.3633
0.0556	18.07	9000	0.5466	0.3736
0.0461	19.08	9500	0.5489	0.3566
0.0439	20.08	10000	0.5399	0.3559
0.0397	21.08	10500	0.5154	0.3539
0.0346	22.09	11000	0.5170	0.3513
0.0338	23.09	11500	0.5236	0.3492
0.0342	24.1	12000	0.5288	0.3493
0.0282	25.1	12500	0.5147	0.3449
0.0251	26.1	13000	0.5092	0.3442
0.0268	27.11	13500	0.5093	0.3413
0.021	28.11	14000	0.5310	0.3399
0.022	29.12	14500	0.5185	0.3370

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご