saved_model_git-base Open-source Vision-Language Model - Free Deployment for Accurate Image Captioning

Saved Model Git Base

Developed by holipori

A vision-language model fine-tuned on image folder datasets based on microsoft/git-base, primarily used for image caption generation tasks

Image-to-Text

Transformers

OtherOpen Source License:MIT #Image Caption Generation #Multimodal Model #Fine-tuning Optimization

Downloads 13

Release Time : 5/22/2023

Model Overview

This model is a vision-language model based on the GIT architecture, capable of generating relevant textual descriptions from input images after fine-tuning. It demonstrates good text generation capabilities in evaluations.

Model Features

Multimodal Understanding Capability

Capable of processing both visual and linguistic information simultaneously to understand image content and generate relevant descriptions

Fine-tuning Optimization

Fine-tuned on specific image datasets to enhance performance in target domains

Comprehensive Evaluation Metrics

Utilizes multiple text generation evaluation metrics (Rouge, Bleu, Meteor, etc.) for comprehensive assessment

Model Capabilities

Image Understanding

Text Generation

Multimodal Processing

Image Caption Generation

Use Cases

Assistive Technology

Visual Assistance Description

Generates textual descriptions of image content for visually impaired individuals

Content Creation

Social Media Content Generation

Automatically generates descriptive text for uploaded images

🚀 saved_model_git-base

This model is a fine - tuned version of [microsoft/git - base](https://huggingface.co/microsoft/git - base) on the imagefolder dataset. It offers enhanced performance on specific tasks, providing more accurate results in text - generation and related evaluations.

📚 Documentation

This model achieves the following results on the evaluation set:

Loss: 0.2473
Wer Score: 2.7325
Rouge1: 0.3059
Rouge2: 0.1738
Rougel: 0.2760
Rougelsum: 0.2759
Meteor: 0.4991
Bleu: 0.1058
Bleu1: 0.2113
Bleu2: 0.1272
Bleu3: 0.0824
Bleu4: 0.0566

Model Information

Property	Details
Model Type	Fine - tuned version of microsoft/git - base
Training Data	imagefolder dataset
Metrics	Rouge, Bleu

Model Index

Name: saved_model_git - base Results:
- Task: Name: Causal Language Modeling Type: text - generation Dataset: Name: imagefolder Type: imagefolder Config: default Split: train Args: default Metrics:
  - Name: Rouge1 Type: rouge Value: 0.3058988098589094
  - Name: Bleu Type: bleu Value: 0.10580263597345552

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 112
eval_batch_size: 112
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 224
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Score	Rouge1	Rouge2	Rougel	Rougelsum	Meteor	Bleu	Bleu1	Bleu2	Bleu3	Bleu4
0.774	1.7	1000	0.2771	3.5978	0.2206	0.1145	0.1981	0.1981	0.4163	0.0774	0.1712	0.0965	0.0580	0.0375
0.2763	3.4	2000	0.2537	3.6165	0.2273	0.1237	0.2050	0.2050	0.4374	0.0840	0.1757	0.1032	0.0642	0.0428
0.2567	5.11	3000	0.2423	3.5963	0.2317	0.1299	0.2105	0.2105	0.4500	0.0881	0.1790	0.1074	0.0681	0.0460
0.2447	6.81	4000	0.2349	3.5915	0.2352	0.1336	0.2136	0.2136	0.4573	0.0907	0.1812	0.1100	0.0706	0.0481
0.2357	8.51	5000	0.2297	3.5867	0.2364	0.1364	0.2158	0.2158	0.4617	0.0927	0.1820	0.1120	0.0726	0.0499
0.2287	10.21	6000	0.2258	3.5781	0.2393	0.1392	0.2183	0.2183	0.4681	0.0947	0.1837	0.1139	0.0745	0.0515
0.2228	11.91	7000	0.2223	3.5628	0.2413	0.1419	0.2208	0.2208	0.4734	0.0965	0.1853	0.1158	0.0762	0.0531
0.2173	13.62	8000	0.2200	3.5171	0.2459	0.1452	0.2249	0.2249	0.4779	0.0976	0.1860	0.1167	0.0773	0.0540
0.2132	15.32	9000	0.2184	3.5207	0.2461	0.1464	0.2253	0.2254	0.4804	0.0994	0.1885	0.1187	0.0789	0.0553
0.2085	17.02	10000	0.2174	3.5189	0.2484	0.1468	0.2259	0.2259	0.4842	0.0998	0.1895	0.1190	0.0791	0.0555
0.2027	18.72	11000	0.2179	3.2891	0.2656	0.1571	0.2411	0.2411	0.4952	0.1036	0.1970	0.1233	0.0820	0.0577
0.1961	20.43	12000	0.2213	3.3457	0.2610	0.1534	0.2367	0.2367	0.4900	0.1025	0.1962	0.1223	0.0810	0.0568
0.1886	22.13	13000	0.2260	2.9878	0.2914	0.1696	0.2628	0.2628	0.5028	0.1053	0.2040	0.1257	0.0828	0.0579
0.1797	23.83	14000	0.2305	3.0250	0.2874	0.1668	0.2597	0.2597	0.4987	0.1053	0.2051	0.1259	0.0827	0.0575
0.1713	25.53	15000	0.2376	2.7048	0.3125	0.1797	0.2822	0.2822	0.5062	0.1078	0.2125	0.1291	0.0843	0.0583
0.1646	27.23	16000	0.2438	2.7129	0.3087	0.1761	0.2786	0.2785	0.5021	0.1066	0.2120	0.1281	0.0831	0.0573
0.159	28.94	17000	0.2473	2.7325	0.3059	0.1738	0.2760	0.2759	0.4991	0.1058	0.2113	0.1272	0.0824	0.0566

Framework versions

Transformers 4.29.2
Pytorch 2.0.0
Datasets 2.12.0
Tokenizers 0.13.3

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご