Mini Image Captioning
M
Mini Image Captioning
Developed by cnmoro
A lightweight image captioning model based on bert-mini and vit-small, weighing only 130MB, with extremely fast performance on CPU.
Downloads 292
Release Time : 1/27/2025
Model Overview
This model combines the lightweight architectures of a vision encoder (ViT) and a text decoder (BERT), specifically designed to generate descriptive text captions for input images.
Model Features
Lightweight and Efficient
The model is only 130MB in size and is specially optimized for CPU inference speed (e.g., only 0.19 seconds in the example).
Dual-Modal Architecture
Combines the strengths of Vision Transformer (ViT) and Text Transformer (BERT).
Adjustable Generation
Supports various generation strategies such as temperature sampling, top-p/top-k filtering, and beam search.
Model Capabilities
Image Understanding
Natural Language Generation
Scene Description
Multimodal Processing
Use Cases
Content Generation
Social Media Image Tagging
Automatically generates descriptive text for uploaded social media images.
Produces coherent descriptions like 'A large crowd walking through a bustling city.'
Accessibility
Visual Impairment Assistance
Provides audio descriptions of image content for visually impaired users.
Featured Recommended AI Models
Š 2025AIbase