B

Blip Image Captioning Base Mocha

Developed by moranyanuka
Official checkpoint of BLIP base model fine-tuned on MS-COCO dataset using MOCHA reinforcement learning framework to mitigate open-vocabulary description hallucination
Downloads 88
Release Time : 12/19/2023

Model Overview

This model is an image-to-text generation model based on BLIP architecture, specifically designed for generating image captions. Fine-tuned with the MOCHA reinforcement learning framework, it effectively reduces hallucination issues in descriptions.

Model Features

MOCHA reinforcement learning fine-tuning
Fine-tuned with MOCHA framework to effectively mitigate hallucination issues in open-vocabulary descriptions
Dual-mode generation
Supports both conditional and unconditional image caption generation
Multi-precision support
Can run on CPU/GPU with support for both full precision and half precision (float16) modes

Model Capabilities

Image caption generation
Conditional text generation
Unconditional text generation
Multilingual image understanding

Use Cases

Content generation
Automatic image tagging
Automatically generates descriptive text for images in social media or content management systems
Produces accurate, hallucination-free image captions
Assisting visually impaired users
Provides textual descriptions of image content for visually impaired users
Enhances accessibility and aids in understanding visual content
Computer vision research
Vision-language model research
Serves as baseline or comparative model for vision-language tasks
Provides benchmark performance optimized by MOCHA
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase