wav2vec2-base-timit-demo-colab971 Open-Source Speech Recognition Model

Wav2vec2 Base Timit Demo Colab971

Developed by hassnain

A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model, focusing on English speech-to-text tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #TIMIT Dataset

Downloads 23

Release Time : 5/2/2022

Model Overview

This model is a fine-tuned version of wav2vec2-base, specifically designed for English speech recognition tasks, trained on the TIMIT dataset to convert English speech into text.

Model Features

Based on wav2vec2 Architecture

Utilizes Facebook's wav2vec2-base architecture with powerful speech feature extraction capabilities.

Fine-tuned on TIMIT Dataset

Fine-tuned on the TIMIT speech dataset, specializing in English speech recognition tasks.

Relatively Low Word Error Rate

Achieves a word error rate (WER) of 0.4448 on the evaluation set, demonstrating good performance.

Model Capabilities

English Speech Recognition

Speech-to-Text

Use Cases

Speech Transcription

English Speech Transcription

Convert English speech content into text format

Word error rate 0.4448

🚀 wav2vec2-base-timit-demo-colab971

This model is a fine - tuned version of facebook/wav2vec2-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6551
Wer: 0.4448

🚀 Quick Start

This section provides a high - level overview of the model and its performance. The model is based on the pre - trained facebook/wav2vec2-base and fine - tuned on a specific dataset (not specified). The evaluation results show its performance in terms of loss and word error rate (Wer).

📚 Documentation

Model description

This model is a fine - tuned version of facebook/wav2vec2-base. However, more detailed information about the model's architecture, how the fine - tuning affects its performance, etc., is yet to be provided.

Intended uses & limitations

More information about the intended uses of this model and its limitations needs to be added. This could include the types of speech recognition tasks it is best suited for, and any scenarios where it may not perform well.

Training and evaluation data

Details about the training and evaluation data are lacking. Information such as the source of the data, its size, and the characteristics of the speech samples would be valuable.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.9461	1.77	500	3.2175	1.0
2.5387	3.53	1000	1.2239	0.7851
0.9632	5.3	1500	0.7275	0.6352
0.6585	7.07	2000	0.6218	0.5896
0.4875	8.83	2500	0.5670	0.5651
0.397	10.6	3000	0.5796	0.5487
0.3298	12.37	3500	0.5870	0.5322
0.2816	14.13	4000	0.5796	0.5016
0.2396	15.9	4500	0.5956	0.5040
0.2019	17.67	5000	0.5911	0.4847
0.1845	19.43	5500	0.6050	0.4800
0.1637	21.2	6000	0.6518	0.4927
0.1428	22.97	6500	0.6247	0.4645
0.1319	24.73	7000	0.6592	0.4711
0.1229	26.5	7500	0.6526	0.4556
0.1111	28.27	8000	0.6551	0.4448

Framework versions

Transformers 4.11.3
Pytorch 1.11.0+cu113
Datasets 1.18.3
Tokenizers 0.10.3

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご