C

CLIP Convnext Base W Laion2b S13b B82k

Developed by laion
CLIP model based on ConvNeXt-Base architecture, trained on a subset of LAION-5B, supporting zero-shot image classification and image-text retrieval tasks
Downloads 4,522
Release Time : 1/3/2023

Model Overview

This model is a variant of the CLIP model trained using the OpenCLIP framework, employing ConvNeXt-Base as the image encoder and trained on a subset of the LAION-5B dataset, with excellent zero-shot image classification capabilities

Model Features

ConvNeXt architecture
Uses ConvNeXt-Base as the image encoder, exploring alternative CLIP model architectures to ViT and ResNet
Enhanced regularization
Improves model performance using augmentation techniques such as random resized crops, random erasing, and stochastic depth
High-resolution training
Trained at 320x320 high resolution to enhance image recognition capabilities
Large-scale training
Trained on 13B samples from a subset of the LAION-5B dataset, demonstrating excellent sample efficiency

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal representation learning

Use Cases

Computer vision
Image classification
Classify images without fine-tuning
71.7% zero-shot accuracy on ImageNet
Image-text retrieval
Enable image-to-text or text-to-image retrieval
Research
Multimodal research
Used for studying joint vision-language representation learning
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase