TimesFormer-BERT Video Captioning: An Open-Source Video Captioning Model - Easily Add Descriptive Captions to Videos

Timesformer Bert Video Captioning

Developed by AlexZigma

A video caption generation model based on Timesformer and BERT architectures, capable of generating descriptive captions for video content.

Video-to-Text

Transformers

#Video Caption Generation #Multimodal Fusion #Temporal Video Understanding

Downloads 83

Release Time : 7/12/2023

Model Overview

This model combines the video understanding capabilities of Timesformer with the language generation abilities of BERT to automatically generate descriptive captions for video content.

Model Features

Multimodal Understanding

Combines visual and language models to understand video content and generate corresponding captions.

Efficient Training

Uses the Adam optimizer and linear learning rate scheduler to complete training in a relatively short time.

Performance Optimization

Continuously optimizes model performance through multiple training rounds, with gradual improvements in ROUGE and BLEU scores.

Model Capabilities

Video Content Understanding

Automatic Caption Generation

Multimodal Data Processing

Use Cases

Media & Entertainment

Automatic Video Caption Generation

Automatically generates descriptive captions for video content to enhance accessibility.

ROUGE-1 score 30.0468, BLEU score 4.8298

Education

Educational Video Caption Generation

Automatically generates captions for educational videos to assist the learning process.

🚀 timesformer-bert-video-captioning

This is a video captioning model fine - tuned on a specific dataset, achieving good performance on evaluation metrics such as Rouge and Bleu.

🚀 Quick Start

This model is a fine - tuned version of on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.2821
Rouge1: 30.0468
Rouge2: 8.4998
Rougel: 29.0632
Rougelsum: 29.0231
Bleu: 4.8298
Gen Len: 9.5332

📚 Documentation

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Bleu	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
2.4961	0.12	200	1.5879	9.5332	1.6548	25.4717	5.11	24.6679	24.6696
1.6561	0.25	400	2.3515	9.5332	1.5339	26.1748	5.9106	25.413	25.3958
1.5772	0.37	600	2.266	9.5332	1.4510	28.6891	6.0431	27.7387	27.8043
1.492	0.49	800	3.6517	9.5332	1.3760	29.0257	7.8515	28.3142	28.3036
1.4736	0.61	1000	3.4866	9.5332	1.3425	27.9774	6.2175	26.7783	26.7207
1.3856	0.74	1200	3.1649	9.5332	1.3118	27.3532	6.5569	26.4964	26.5087
1.3972	0.86	1400	3.5337	9.5332	1.2868	28.233	7.6471	27.3651	27.3354
1.374	0.98	1600	3.5737	9.5332	1.2571	28.8216	7.542	27.9166	27.9353
1.2207	1.1	1800	3.7983	9.5332	1.3362	29.9574	8.1088	28.8866	28.855
1.1861	1.23	2000	3.6521	9.5332	1.3295	30.072	7.7799	28.8417	28.864
1.1173	1.35	2200	3.9784	9.5332	1.3335	29.736	7.9661	28.6877	28.6974
1.1255	1.47	2400	4.3021	9.5332	1.3097	29.8176	8.4656	28.958	28.9571
1.0909	1.6	2600	1.3095	30.0233	8.4896	29.2562	29.2375	4.4782	9.5332
1.1205	1.72	2800	1.2992	29.7164	8.007	28.5027	28.5018	4.44	9.5332
1.1069	1.84	3000	1.2830	29.851	8.4312	28.8139	28.8205	4.6065	9.5332
1.076	1.96	3200	1.2821	30.0468	8.4998	29.0632	29.0231	4.8298	9.5332

Framework versions

Transformers 4.30.2
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3

Property	Details
Model Type	timesformer - bert - video - captioning
Metrics	rouge, bleu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご