Blip Base Captioning Ft Hl Actions
Apache-2.0
This model is a fine-tuned image-to-text generation model based on the BLIP architecture, specifically designed to generate captions describing high-level actions in images.
Image-to-Text
Transformers English