Nano Image Captioning
N
Nano Image Captioning
Developed by cnmoro
This is a lightweight image captioning model based on bert-tiny and vit-tiny, weighing only 40MB, with extremely fast inference speed on CPU.
Downloads 184
Release Time : 1/28/2025
Model Overview
The model combines a visual encoder (ViT-tiny) and a text decoder (BERT-tiny) to generate concise descriptive captions for input images.
Model Features
Lightweight and Efficient
The model is only 40MB in size and achieves fast inference on CPU (approximately 0.075 seconds per image).
Dual Tiny Architecture
Uses vit-tiny-patch16-224 as the visual encoder and bert_uncased_L-2_H-128_A-2 as the text decoder.
Optimized Inference Settings
Provides multiple generation strategies including temperature sampling, top-p/top-k filtering, and beam search.
Model Capabilities
Image Understanding
Natural Language Generation
Real-Time Caption Generation
Use Cases
Accessibility Technology
Image Description Generation
Automatically generates text descriptions of images for visually impaired users.
Produces concise and accurate image descriptions (e.g., 'A group of people standing in a city center').
Content Management
Automatic Image Tagging
Automatically generates tags and descriptions for gallery or social media images.
Quickly generates searchable metadata.
Featured Recommended AI Models