C

CLIP ViT L 14 CommonPool.XL S13b B90k

Developed by laion
A vision-language pretrained model based on the CLIP architecture, supporting zero-shot image classification and cross-modal retrieval tasks
Downloads 4,255
Release Time : 4/26/2023

Model Overview

This model is a variant of the CLIP series, using ViT-L/14 as the visual encoder, trained on the CommonPool.XL dataset, with strong cross-modal understanding capabilities.

Model Features

Zero-shot learning capability
Can perform image classification tasks without task-specific fine-tuning
Cross-modal understanding
Capable of understanding semantic relationships between images and text
Large-scale pretraining
Trained on the CommonPool.XL dataset (13B samples) with extensive knowledge coverage

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal retrieval
Multimodal feature extraction

Use Cases

Content retrieval
Text-based image search
Retrieve relevant images using natural language queries
Can accurately match image content with text descriptions
Automatic tagging
Automatic image tagging
Generate descriptive labels for images
Can produce semantic labels relevant to image content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase