P

Prolip ViT B 16 DC 1B 12 8B

Developed by SanghyukChun
Probabilistic Language-Image Pretraining (ProLIP) ViT-B/16 model pretrained on the DataComp 1B dataset
Downloads 460
Release Time : 10/18/2024

Model Overview

This is a vision-language model that employs probabilistic language-image pretraining (ProLIP), capable of handling image classification and cross-modal retrieval tasks, with particular strength in zero-shot learning scenarios.

Model Features

Probabilistic Modeling
Uses a probabilistic approach to model image and text feature distributions, enabling the quantification of prediction uncertainty.
Large-scale Pretraining
Pretrained on the DataComp 1B dataset, utilizing 1.28 billion training samples.
Zero-shot Learning Capability
Performs well on new tasks without fine-tuning, supporting zero-shot image classification and retrieval.
Uncertainty Awareness
Capable of outputting uncertainty estimates for image and text features, improving prediction reliability.

Model Capabilities

Zero-shot Image Classification
Cross-modal Retrieval
Uncertainty Estimation
Multimodal Feature Extraction

Use Cases

Image Understanding
Zero-shot Image Classification
Classifies new images without specific training.
Achieves 74.6% top-1 accuracy on ImageNet-1k.
Cross-modal Retrieval
Image-Text Retrieval
Retrieves relevant images based on text queries or relevant text based on images.
Zero-shot retrieval performance of 59.6%.
Robustness Evaluation
Distribution Shift Evaluation
Evaluates model robustness on ImageNet distribution-shifted data.
Achieves 63.0% accuracy.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase