B

Blip Base Captioning Ft Hl Actions

Developed by michelecafagna26
This model is a fine-tuned image-to-text generation model based on the BLIP architecture, specifically designed to generate captions describing high-level actions in images.
Downloads 16
Release Time : 7/22/2023

Model Overview

The model was fine-tuned on the HL dataset, focusing on generating natural language text that describes actions from images.

Model Features

High-level Action Description
Specifically generates descriptive text for high-level actions in images.
Fine-tuning Optimization
Fine-tuned for 6 epochs on the HL dataset to enhance action description capabilities.
Half-precision Training
Trained using fp16 half-precision to improve training efficiency.

Model Capabilities

Image Understanding
Action Description Generation
Natural Language Generation

Use Cases

Image Captioning
Action Scene Description
Generates descriptive text for images containing human actions.
Produces natural language descriptions such as 'She is holding an umbrella.'
Featured Recommended AI Models
ยฉ 2025AIbase