wav2vec2-base-timit-demo-colab647 Open-source Speech Recognition Model - Accurately Identify Speech Content with Low Error Rate

Wav2vec2 Base Timit Demo Colab647

Developed by hassnain

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 0.4799 on the TIMIT dataset.

Downloads 16

Release Time : 5/1/2022

Model Overview

This is a fine-tuned model for speech recognition tasks, based on the wav2vec2 architecture, suitable for English speech-to-text applications.

Low Word Error Rate

Achieved a word error rate of 0.4799 on the evaluation set, demonstrating good performance.

Based on wav2vec2 Architecture

Uses facebook's wav2vec2-base as the base model, with powerful speech feature extraction capabilities.

Efficient Training

Uses mixed-precision training and a linear learning rate scheduler for high training efficiency.

English Speech Recognition

Speech-to-Text

Speech Transcription

Meeting Minutes

Convert English meeting recordings into text transcripts

Word error rate around 48%

Voice Notes

Convert English voice notes into searchable text

Training Loss	Epoch	Step	Validation Loss	Wer
5.2072	7.04	500	3.7757	1.0
1.2053	14.08	1000	0.6128	0.5648
0.3922	21.13	1500	0.5547	0.5035
0.2157	28.17	2000	0.5534	0.4799

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base