C

CLIP ViT B 16 DataComp.XL S13b B90k

Developed by laion
This is a CLIP ViT-B/16 model trained using OpenCLIP on the DataComp-1B dataset, primarily used for zero-shot image classification and image-text retrieval.
Downloads 4,461
Release Time : 5/15/2023

Model Overview

Based on the CLIP architecture, this model achieves zero-shot image classification capabilities through large-scale multimodal training and supports cross-modal retrieval tasks.

Model Features

Large-scale Multimodal Training
Trained on 1.4 billion samples from the DataComp-1B dataset, demonstrating strong generalization capabilities.
Zero-shot Learning Capability
Performs various vision tasks such as zero-shot image classification without task-specific fine-tuning.
Cross-modal Understanding
Capable of understanding semantic relationships between images and text, supporting cross-modal retrieval tasks.

Model Capabilities

Zero-shot Image Classification
Image-Text Retrieval
Cross-modal Understanding
Image Feature Extraction

Use Cases

Computer Vision
Zero-shot Image Classification
Classifies images without training, supporting custom classification systems.
Achieves 73.5% zero-shot top-1 accuracy on ImageNet-1k
Image Retrieval
Retrieves relevant images based on text descriptions or retrieves relevant text based on images.
Research Applications
Multimodal Research
Used for studying representation learning and cross-modal understanding in vision-language models.
Featured Recommended AI Models
ยฉ 2025AIbase