wav2vec2-base-timit-demo-colab Speech Recognition Model - Open-source and Free for Precise Speech Recognition

Wav2vec2 Base Timit Demo Colab

Developed by obokkkk

A speech recognition model fine-tuned based on facebook/wav2vec2-base, trained on the TIMIT dataset with a Word Error Rate (WER) of 0.3468.

Downloads 20

Release Time : 4/20/2022

Model Overview

This is a model for English speech recognition, fine-tuned based on the wav2vec2 architecture, suitable for tasks converting speech to text.

Low Word Error Rate

Achieves a Word Error Rate (WER) of 0.3468 on the evaluation set, demonstrating good performance.

Based on wav2vec2 Architecture

Uses facebook's wav2vec2-base model as the foundational architecture, featuring robust speech feature extraction capabilities.

Fine-tuned Training

Fine-tuned on the TIMIT dataset, optimized for specific speech recognition tasks.

English Speech Recognition

Speech-to-Text

Speech Transcription

Meeting Minutes

Automatically convert English meeting recordings into text transcripts

Accuracy approximately 65.32% (1-WER)

Voice Notes

Convert English voice notes into searchable text

Training Loss	Epoch	Step	Validation Loss	Wer
3.4408	4.0	500	1.2302	0.9116
0.561	8.0	1000	0.4809	0.4320
0.2091	12.0	1500	0.4285	0.3880
0.1221	16.0	2000	0.4448	0.3665
0.0858	20.0	2500	0.4622	0.3585
0.0597	24.0	3000	0.4621	0.3517
0.0453	28.0	3500	0.4779	0.3468

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base