wav2vec2-base-timit-demo-colab57 Open-source Speech Recognition Model

Wav2vec2 Base Timit Demo Colab57

Developed by hassnain

A speech recognition model fine-tuned based on facebook/wav2vec2-base, trained on the TIMIT dataset with a Word Error Rate (WER) of 0.4593.

Downloads 16

Release Time : 5/1/2022

Model Overview

This is an automatic speech recognition (ASR) model for English, fine-tuned based on the wav2vec2 architecture.

Low Word Error Rate

Achieves a Word Error Rate (WER) of 0.4593 on the evaluation set.

Based on wav2vec2 Architecture

Uses facebook/wav2vec2-base as the base model for fine-tuning.

End-to-End Training

Adopts an end-to-end training approach, directly learning the mapping from speech to text.

English Speech Recognition

Speech-to-Text

Speech Transcription

Meeting Minutes Transcription

Automatically converts English meeting recordings into text transcripts.

Word Error Rate around 46%

Voice Command Recognition

Recognizes English voice commands and converts them into executable commands.

Training Loss	Epoch	Step	Validation Loss	Wer
4.9876	7.04	500	3.1483	1.0
1.4621	14.08	1000	0.6960	0.6037
0.4404	21.13	1500	0.6392	0.5630
0.2499	28.17	2000	0.6738	0.5281
0.1732	35.21	2500	0.6789	0.4952
0.1347	42.25	3000	0.7328	0.4835
0.1044	49.3	3500	0.7258	0.4840
0.0896	56.34	4000	0.7328	0.4593

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base