wav2vec2 - tcrs open-source speech recognition model, with low word error rate after fine-tuning and accurate recognition of speech content

Wav2vec2 Tcrs

Developed by neelan-elucidate-ai

A fine-tuned speech recognition model based on facebook/wav2vec2-large-lv60, achieving a word error rate of 1.0657 on the evaluation set

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #Fine-tuned Model

Downloads 20

Release Time : 5/4/2022

Model Overview

This model is a fine-tuned model for speech recognition tasks, based on the wav2vec2 architecture, suitable for applications converting speech to text.

Model Features

Low Word Error Rate

Achieved a word error rate of 1.0657 on the evaluation set, demonstrating excellent performance

Based on wav2vec2 Architecture

Uses facebook/wav2vec2-large-lv60 as the base model, with strong speech feature extraction capabilities

Fine-tuned

After 100 epochs of fine-tuning, the model's performance has been significantly improved

Model Capabilities

Speech-to-Text

Automatic Speech Recognition

Use Cases

Speech Transcription

Automatic Meeting Minutes Generation

Automatically converts meeting recordings into text transcripts

Highly accurate transcription results

Voice Assistant

Used as the speech recognition module for voice assistants

Fast and accurate speech understanding

Accessibility Applications

Real-time Caption Generation

Provides real-time caption services for the hearing impaired

Low-latency and high-accuracy caption output

🚀 wav2vec2-tcrs

This model is a fine - tuned version of facebook/wav2vec2-large-lv60 on the None dataset. It offers a loss of 2.9550 and a Word Error Rate (Wer) of 1.0657 on the evaluation set, providing a reliable solution for relevant speech - related tasks.

📚 Documentation

Model Details

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-large-lv60
Training Data	None dataset

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 2.9550
Wer: 1.0657

🔧 Technical Details

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 100
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
13.6613	3.38	500	3.2415	1.0
2.9524	6.76	1000	3.0199	1.0
2.9425	10.14	1500	3.0673	1.0
2.9387	13.51	2000	3.0151	1.0
2.9384	16.89	2500	3.0320	1.0
2.929	20.27	3000	2.9691	1.0
2.9194	23.65	3500	2.9596	1.0
2.9079	27.03	4000	2.9279	1.0
2.8957	30.41	4500	2.9647	1.0
2.8385	33.78	5000	2.8114	1.0193
2.6546	37.16	5500	2.6744	1.0983
2.5866	40.54	6000	2.6192	1.1071
2.5475	43.92	6500	2.5777	1.0950
2.5177	47.3	7000	2.5845	1.1220
2.482	50.68	7500	2.5730	1.1264
2.4343	54.05	8000	2.5722	1.0955
2.3754	57.43	8500	2.5781	1.1353
2.3055	60.81	9000	2.6177	1.0972
2.2446	64.19	9500	2.6351	1.1027
2.1625	67.57	10000	2.6924	1.0756
2.1078	70.95	10500	2.6817	1.0795
2.0366	74.32	11000	2.7629	1.0657
1.9899	77.7	11500	2.7972	1.0845
1.9309	81.08	12000	2.8450	1.0734
1.8861	84.46	12500	2.8703	1.0668
1.8437	87.84	13000	2.9308	1.0917
1.8192	91.22	13500	2.9298	1.0701
1.7952	94.59	14000	2.9488	1.0685
1.7745	97.97	14500	2.9550	1.0657

Framework Versions

Transformers 4.11.3
Pytorch 1.9.1
Datasets 1.18.3
Tokenizers 0.10.3

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご