V

Vit H 14 CLIPA 336 Datacomp1b

Developed by UCSC-VLAA
CLIPA-v2 model, an efficient contrastive vision-language model focused on zero-shot image classification tasks.
Downloads 493
Release Time : 10/17/2023

Model Overview

This model is based on the CLIPA-v2 architecture, achieving joint representation of images and text through contrastive learning, particularly suitable for zero-shot image classification scenarios.

Model Features

Efficient Zero-shot Classification
Achieves high-precision zero-shot ImageNet classification (81.1% accuracy) under limited budget
Inverse Scaling Optimization
Adopts innovative training methods to efficiently balance computational resources and model performance
Large-scale Data Training
Trained on the datacomp_1b dataset, with strong generalization capabilities

Model Capabilities

Zero-shot Image Classification
Image-Text Matching
Cross-modal Feature Extraction

Use Cases

Image Understanding
Zero-shot Image Classification
Classifies images of new categories without specific training
Achieves 81.1% accuracy on ImageNet
Content Moderation
Inappropriate Content Detection
Identifies non-compliant image content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase