Wav2vec2-base-timit Open-source Speech Recognition Model - Free Deployment for English Speech-to-Text Conversion

Wav2vec2 Base Timit Demo Google Colab

Developed by dasolj

A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, specializing in English speech-to-text tasks

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #TIMIT Dataset #Low Word Error Rate

Downloads 127

Release Time : 6/27/2022

Model Overview

This model is a fine-tuned version of wav2vec2-base, specifically designed for English speech recognition tasks, trained on the TIMIT dataset, capable of converting English speech into text

Model Features

Fine-tuned on wav2vec2-base

Optimized for specific tasks based on the powerful wav2vec2-base

Low Word Error Rate

Achieves a Word Error Rate (WER) of 0.3424 on the evaluation set

End-to-End Speech Recognition

Directly converts raw audio input into text output

Model Capabilities

English Speech Recognition

Audio-to-Text

Automatic Speech Transcription

Use Cases

Speech Transcription

Automated Meeting Minutes

Automatically converts English meeting recordings into text transcripts

Word Error Rate around 34%

Voice Note Conversion

Converts English voice notes into editable text

Assistive Technology

Real-time Caption Generation

Generates real-time captions for English video content

🚀 wav2vec2-base-timit-demo-google-colab

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It offers valuable performance metrics on the evaluation set, which can be useful for speech - related tasks.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5501
Wer: 0.3424

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.5448	1.0	500	2.5044	1.0
1.0167	2.01	1000	0.5435	0.5278
0.4453	3.01	1500	0.4450	0.4534
0.3	4.02	2000	0.4401	0.4245
0.2304	5.02	2500	0.4146	0.4022
0.1889	6.02	3000	0.4241	0.3927
0.1573	7.03	3500	0.4545	0.3878
0.1363	8.03	4000	0.4936	0.3940
0.1213	9.04	4500	0.4964	0.3806
0.108	10.04	5000	0.4931	0.3826
0.0982	11.04	5500	0.5373	0.3778
0.0883	12.05	6000	0.4978	0.3733
0.0835	13.05	6500	0.5189	0.3728
0.0748	14.06	7000	0.4608	0.3692
0.068	15.06	7500	0.4827	0.3608
0.0596	16.06	8000	0.5022	0.3661
0.056	17.07	8500	0.5482	0.3646
0.0565	18.07	9000	0.5158	0.3573
0.0487	19.08	9500	0.4910	0.3513
0.0444	20.08	10000	0.5771	0.3580
0.045	21.08	10500	0.5160	0.3539
0.0363	22.09	11000	0.5367	0.3503
0.0313	23.09	11500	0.5773	0.3500
0.0329	24.1	12000	0.5683	0.3508
0.0297	25.1	12500	0.5355	0.3464
0.0272	26.1	13000	0.5317	0.3450
0.0256	27.11	13500	0.5602	0.3443
0.0242	28.11	14000	0.5586	0.3419
0.0239	29.12	14500	0.5501	0.3424

Framework versions

Transformers 4.17.0
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご