C

CLIP ViT L 14 DataComp.XL S13b B90k

Developed by laion
This model is a CLIP ViT-L/14 trained on the DataComp-1B dataset, primarily used for zero-shot image classification and image-text retrieval tasks.
Downloads 586.75k
Release Time : 4/26/2023

Model Overview

A vision-language model trained using the OpenCLIP framework on the DataComp-1B dataset, capable of performing tasks such as zero-shot image classification and image-text retrieval.

Model Features

Large-scale training data
Trained on 1.4 billion samples from the DataComp-1B dataset, covering a wide range of visual concepts
Zero-shot capability
Capable of performing image classification tasks on new categories without fine-tuning
Cross-modal understanding
Simultaneously understands image and text information, supporting image-text retrieval tasks

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal understanding

Use Cases

Computer vision
Image classification
Classify images of new categories without training
Achieves 79.2% zero-shot top-1 accuracy on ImageNet-1k
Image-text retrieval
Search for relevant images based on text descriptions, or generate descriptions from images
Research
Multimodal research
Study representation learning and transfer capabilities of vision-language models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase