D-L-DL Open Source Speech Recognition Model - Free Deployment for High-Precision Speech Recognition

D L Dl

Developed by bkh6722

This model is a speech recognition model fine-tuned based on facebook/wav2vec2-base-960h, achieving a word error rate (WER) of 1.0 on the evaluation set.

Speech Recognition

Transformers

#Speech Recognition #Fine-tuned Model #Low Word Error Rate

Downloads 25

Release Time : 5/17/2022

Model Overview

An automatic speech recognition (ASR) model for speech-to-text tasks, fine-tuned based on the wav2vec2 architecture

Model Features

Based on wav2vec2 architecture

Uses facebook's open-source wav2vec2-base-960h model as the base architecture

Low Word Error Rate

Achieves a word error rate (WER) of 1.0 on the evaluation set

Model Capabilities

Speech-to-text

English speech recognition

Use Cases

Speech Transcription

Meeting Minutes Transcription

Automatically convert meeting recordings into text transcripts

Voice Note Conversion

Convert voice memos into editable text

🚀 d-l-dl

This model is a fine - tuned version of facebook/wav2vec2-base-960h on an unknown dataset. It offers a practical solution for speech - related tasks by leveraging the pre - trained capabilities of the base model and fine - tuning them.

🚀 Quick Start

This model can be used in speech - related tasks. You can load it using relevant deep - learning frameworks and start inference or further fine - tuning.

✨ Features

Based on the pre - trained [facebook/wav2vec2-base-960h] model, which has been proven effective in speech processing.
Fine - tuned on an unknown dataset to potentially adapt to specific speech characteristics.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Model description

This model is a fine - tuned version of [facebook/wav2vec2-base-960h] on an unknown dataset. However, more detailed information about the model's architecture, design philosophy, and specific application scenarios needs to be further explored.

Intended uses & limitations

The original document does not provide specific information about the intended uses and limitations of this model.

Training and evaluation data

The original document does not provide details about the training and evaluation data, such as the source, size, and characteristics of the dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 800
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
42.4143	49.8	100	21.5116	1.0
5.9884	99.8	200	31.7976	1.0
4.0043	149.8	300	3.4829	1.0
3.653	199.8	400	3.6417	1.0
3.5207	249.8	500	3.5081	1.0
3.63	299.8	600	3.4836	1.0
3.648	349.8	700	3.4515	1.0
3.6448	399.8	800	3.4647	1.0
3.6872	449.8	900	3.4371	1.0
3.6892	499.8	1000	3.4337	1.0
3.684	549.8	1100	3.4375	1.0
3.6843	599.8	1200	3.4452	1.0
3.6842	649.8	1300	3.4416	1.0
3.6819	699.8	1400	3.4498	1.0
3.6832	749.8	1500	3.4524	1.0
3.6828	799.8	1600	3.4495	1.0

Framework versions

Transformers 4.11.3
Pytorch 1.10.0+cu113
Datasets 1.18.3
Tokenizers 0.10.3

🔧 Technical Details

The original document does not provide in - depth technical details about the model's design, algorithm, and implementation.

📄 License

The original document does not provide license information.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご