C

CLIP ViT L 14 Laion2b S32b B82k

Developed by laion
A vision-language model trained on the English subset of LAION-2B using the OpenCLIP framework, supporting zero-shot image classification and image-text retrieval
Downloads 79.01k
Release Time : 9/14/2022

Model Overview

This model employs the ViT-L/14 architecture, trained on a 2-billion English sample subset of the LAION-5B dataset, with strong cross-modal understanding capabilities that map images and text into a shared embedding space

Model Features

Large-scale Training Data
Trained on 2 billion English samples from the LAION-5B dataset, covering a wide range of visual concepts
Zero-shot Learning Capability
Can perform image classification tasks for new categories without fine-tuning
Cross-modal Understanding
Maps images and text into a shared semantic space, supporting bidirectional image-text retrieval
High Accuracy
Achieves 75.3 zero-shot top-1 accuracy on ImageNet-1k

Model Capabilities

Zero-shot image classification
Image-text retrieval
Text-image retrieval
Cross-modal feature extraction

Use Cases

Content Retrieval
Image Search Engine
Retrieve relevant images using natural language queries
Intelligent Classification
Dynamic Image Classification
Classify new categories without retraining
Achieves 75.3% accuracy on ImageNet-1k
Creative Assistance
Image Generation Guidance
Provide text-conditioned guidance for generative models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase