CLIP ViT L 14 CommonPool.XL.laion S13b B90k
C
CLIP ViT L 14 CommonPool.XL.laion S13b B90k
Developed by laion
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks, trained on the laion dataset
Downloads 176
Release Time : 4/26/2023
Model Overview
This model is a variant of the CLIP architecture, combining Vision Transformer (ViT) and a text encoder, capable of understanding the relationship between images and text, suitable for cross-modal tasks such as zero-shot image classification.
Model Features
Zero-shot Learning Capability
Can perform image classification tasks without task-specific training
Cross-modal Understanding
Capable of processing and understanding both visual and textual information
Large-scale Pretraining
Trained on the large-scale laion-s13B-b90K dataset
Model Capabilities
Image Classification
Cross-modal Retrieval
Image-Text Matching
Use Cases
Content Management
Automatic Image Tagging
Automatically generates descriptive tags for unlabeled images
E-commerce
Visual Search
Searches for relevant product images via text queries
Featured Recommended AI Models