wav2vec2-base-timit-demo-colab Open-Source Speech Recognition Model - Achieve High-Precision Speech-to-Text for Free

Wav2vec2 Base Timit Demo Colab

Developed by wasilkas

A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, with a Word Error Rate (WER) of 0.3382

Downloads 24

Release Time : 3/20/2022

Model Overview

This is a model for English speech recognition, fine-tuned on the TIMIT dataset based on the wav2vec2 architecture.

Low Word Error Rate

Achieves a Word Error Rate (WER) of 0.3382 on the TIMIT evaluation set

Based on wav2vec2 Architecture

Uses facebook's wav2vec2-base as the base model

Lightweight

Inference is based on the base version, requiring relatively low computational resources

English Speech Recognition

Audio-to-Text Conversion

Speech Transcription

English Speech Transcription

Converts English speech content into text

Word Error Rate 0.3382

Education

Pronunciation Assessment

Can be used in pronunciation assessment systems for English learners

Training Loss	Epoch	Step	Validation Loss	Wer
3.4787	4.0	500	1.4190	0.9939
0.5835	8.0	1000	0.4711	0.4370
0.219	12.0	1500	0.4555	0.3994
0.1251	16.0	2000	0.4515	0.3654
0.0834	20.0	2500	0.4923	0.3564
0.0632	24.0	3000	0.4410	0.3399
0.0491	28.0	3500	0.4491	0.3382

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base