C

CLIP Convnext Base W Laion2b S13b B82k Augreg

Developed by laion
CLIP model based on ConvNeXt-Base architecture, trained on a subset of LAION-5B using OpenCLIP, focusing on zero-shot image classification tasks
Downloads 40.86k
Release Time : 1/10/2023

Model Overview

This model is a variant of the CLIP series, employing ConvNeXt-Base as the image encoder and trained on the LAION-2B dataset. It explores alternative architectures to ViT and ResNet, incorporating augmentation and regularization techniques.

Model Features

ConvNeXt Architecture
First large-scale trained ConvNeXt CLIP model, exploring architectural alternatives to ViT and ResNet
Augmentation Regularization
Incorporates augmentation techniques like random resized crops, random erasing, and stochastic depth to enhance model performance
High Sample Efficiency
Achieves 70%+ zero-shot accuracy on ImageNet after 13B samples training, demonstrating good sample efficiency

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal representation learning

Use Cases

Computer Vision
Image Classification
Classify images of arbitrary categories without fine-tuning
71.5% zero-shot top-1 accuracy on ImageNet
Image Retrieval
Retrieve relevant images based on text descriptions
Research
Multimodal Learning Research
Study alignment between visual and language representations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase