Vit H 14 CLIPA Datacomp1b
CLIPA-v2 model, an efficient contrastive vision-language model designed for zero-shot image classification tasks.
Downloads 65
Release Time : 10/17/2023
Model Overview
This model is a contrastive vision-language model based on the CLIPA-v2 architecture, primarily used for zero-shot image classification tasks. It maps images and text into the same feature space through contrastive learning, enabling zero-shot classification capabilities without task-specific training.
Model Features
Efficient zero-shot classification
Achieves image classification without task-specific training
Large-scale data training
Trained on the mlfoundations/datacomp_1b dataset
High accuracy
Achieves 81.1% zero-shot accuracy on ImageNet
Cost-effective
Delivers high performance with a low budget
Model Capabilities
Zero-shot image classification
Image-text matching
Multimodal feature extraction
Use Cases
Image classification
Zero-shot object recognition
Recognizes objects of new categories without training
Example accurately identified a French donut
Multimodal applications
Image search
Searches for relevant images via text queries
Featured Recommended AI Models
Š 2025AIbase