V

Vit H 14 CLIPA Datacomp1b

Developed by UCSC-VLAA
CLIPA-v2 model, an efficient contrastive vision-language model designed for zero-shot image classification tasks.
Downloads 65
Release Time : 10/17/2023

Model Overview

This model is a contrastive vision-language model based on the CLIPA-v2 architecture, primarily used for zero-shot image classification tasks. It maps images and text into the same feature space through contrastive learning, enabling zero-shot classification capabilities without task-specific training.

Model Features

Efficient zero-shot classification
Achieves image classification without task-specific training
Large-scale data training
Trained on the mlfoundations/datacomp_1b dataset
High accuracy
Achieves 81.1% zero-shot accuracy on ImageNet
Cost-effective
Delivers high performance with a low budget

Model Capabilities

Zero-shot image classification
Image-text matching
Multimodal feature extraction

Use Cases

Image classification
Zero-shot object recognition
Recognizes objects of new categories without training
Example accurately identified a French donut
Multimodal applications
Image search
Searches for relevant images via text queries
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase