Prolip ViT B 16 DC 1B 12 8B
Probabilistic Language-Image Pretraining (ProLIP) ViT-B/16 model pretrained on the DataComp 1B dataset
Downloads 460
Release Time : 10/18/2024
Model Overview
This is a vision-language model that employs probabilistic language-image pretraining (ProLIP), capable of handling image classification and cross-modal retrieval tasks, with particular strength in zero-shot learning scenarios.
Model Features
Probabilistic Modeling
Uses a probabilistic approach to model image and text feature distributions, enabling the quantification of prediction uncertainty.
Large-scale Pretraining
Pretrained on the DataComp 1B dataset, utilizing 1.28 billion training samples.
Zero-shot Learning Capability
Performs well on new tasks without fine-tuning, supporting zero-shot image classification and retrieval.
Uncertainty Awareness
Capable of outputting uncertainty estimates for image and text features, improving prediction reliability.
Model Capabilities
Zero-shot Image Classification
Cross-modal Retrieval
Uncertainty Estimation
Multimodal Feature Extraction
Use Cases
Image Understanding
Zero-shot Image Classification
Classifies new images without specific training.
Achieves 74.6% top-1 accuracy on ImageNet-1k.
Cross-modal Retrieval
Image-Text Retrieval
Retrieves relevant images based on text queries or relevant text based on images.
Zero-shot retrieval performance of 59.6%.
Robustness Evaluation
Distribution Shift Evaluation
Evaluates model robustness on ImageNet distribution-shifted data.
Achieves 63.0% accuracy.
Featured Recommended AI Models
Š 2025AIbase