Vit H 14 CLIPA 336 Datacomp1b
CLIPA-v2 model, an efficient contrastive vision-language model focused on zero-shot image classification tasks.
Downloads 493
Release Time : 10/17/2023
Model Overview
This model is based on the CLIPA-v2 architecture, achieving joint representation of images and text through contrastive learning, particularly suitable for zero-shot image classification scenarios.
Model Features
Efficient Zero-shot Classification
Achieves high-precision zero-shot ImageNet classification (81.1% accuracy) under limited budget
Inverse Scaling Optimization
Adopts innovative training methods to efficiently balance computational resources and model performance
Large-scale Data Training
Trained on the datacomp_1b dataset, with strong generalization capabilities
Model Capabilities
Zero-shot Image Classification
Image-Text Matching
Cross-modal Feature Extraction
Use Cases
Image Understanding
Zero-shot Image Classification
Classifies images of new categories without specific training
Achieves 81.1% accuracy on ImageNet
Content Moderation
Inappropriate Content Detection
Identifies non-compliant image content
Featured Recommended AI Models
Š 2025AIbase