The open-source speech recognition model wav2vec2-base-timit-demo-colab11 accurately recognizes speech with a low error rate.

Wav2vec2 Base Timit Demo Colab11

Developed by sameearif88

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 0.4348 on the TIMIT dataset.

Downloads 18

Release Time : 5/1/2022

Model Overview

This is a model for English speech recognition, fine-tuned based on the wav2vec2 architecture, suitable for tasks converting English speech to text.

Low Word Error Rate

Achieved a word error rate of 0.4348 on the evaluation set, demonstrating good performance.

Based on wav2vec2 Architecture

Uses Facebook's wav2vec2-base as the foundational model.

Mixed Precision Training

Utilizes native AMP for training, improving training efficiency.

English Speech Recognition

Speech-to-Text

Speech Transcription

Meeting Minutes

Automatically converts English meeting recordings into text transcripts.

Word error rate approximately 43.48%

Voice Notes

Converts English voice notes into searchable text.

Training Loss	Epoch	Step	Validation Loss	Wer
4.2269	3.52	500	1.1191	0.7121
0.8297	7.04	1000	0.6064	0.5228
0.4988	10.56	1500	0.5057	0.4627
0.3635	14.08	2000	0.4922	0.4348

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base