Vit2distilgpt2
V
Vit2distilgpt2
Developed by sachin
This is an image-to-text generation model capable of receiving images and outputting descriptive text.
Downloads 49
Release Time : 3/2/2022
Model Overview
The model is based on ViT and DistilGPT2 architectures, specifically designed for image captioning tasks, trained on the COCO2017 dataset.
Model Features
Vision-Language Joint Model
Combines visual encoder and language decoder to achieve image-to-text conversion
Trained on COCO Dataset
Trained on a widely-used image captioning dataset, offering good generalization capabilities
Lightweight Architecture
Uses DistilGPT2 as the decoder, making it more lightweight compared to the full GPT2
Model Capabilities
Image Understanding
Text Generation
Image Caption Generation
Use Cases
Assistive Technology
Visual Assistance
Generates image descriptions for visually impaired individuals
Content Generation
Social Media Content Auto-Generation
Automatically generates descriptive text for uploaded images
Featured Recommended AI Models
Š 2025AIbase