C

CLIP ViT B 32 256x256 DataComp S34b B86k

Developed by laion
This is a CLIP ViT-B/32 model trained on the DataComp-1B dataset using the OpenCLIP framework at 256x256 resolution, primarily for zero-shot image classification and image-text retrieval tasks.
Downloads 4,332
Release Time : 9/12/2023

Model Overview

This model is a vision-language model trained on the DataComp-1B dataset, capable of performing tasks such as zero-shot image classification and image-text retrieval.

Model Features

Large-scale data training
Trained on 1.4 billion samples from the DataComp-1B dataset, with strong generalization capabilities.
Zero-shot learning capability
Can perform various image classification tasks without task-specific fine-tuning.
High-resolution support
Supports 256x256 resolution image input, capturing richer visual details.

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal understanding

Use Cases

Image understanding
Zero-shot image classification
Classify images without training
Achieves 72.7% zero-shot top-1 accuracy on ImageNet-1k
Image-text retrieval
Retrieve relevant images based on text queries or relevant text based on images
Achieves 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on the COCO dataset
Research
Cross-modal learning research
Study the associations between visual and language modalities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase