C

CLIP Convnext Xxlarge Laion2b S34b B82k Augreg

Developed by laion
CLIP ConvNeXt-XXLarge model trained on LAION-2B dataset, implemented with OpenCLIP framework, the first non-ViT architecture achieving >79% ImageNet zero-shot accuracy
Downloads 6,616
Release Time : 2/26/2023

Model Overview

This model is a variant of CLIP architecture using ConvNeXt-XXLarge as image encoder, trained on LAION-2B dataset, supporting zero-shot image classification and image-text retrieval tasks

Model Features

Large-scale ConvNeXt architecture
Using 847M-parameter ConvNeXt-XXLarge as image encoder, currently the largest ConvNeXt pretrained model
High-performance zero-shot classification
Achieves 79.1% zero-shot Top-1 accuracy on ImageNet, performance between ViT-g and ViT-G
Optimized training process
Adopts phased training strategy with global batch size up to 95744, combining bfloat16 precision and special optimization strategies
Image size adaptability
Compared to ViT architecture, shows better computational efficiency and performance at larger input resolutions

Model Capabilities

Zero-shot image classification
Image-text similarity calculation
Cross-modal retrieval
Image feature extraction
Text feature extraction

Use Cases

Computer vision
Image classification
Classify images of arbitrary categories without fine-tuning
79.1% Top-1 accuracy on ImageNet
Image retrieval
Retrieve relevant images based on text descriptions
Multimodal research
Vision-language alignment
Study the alignment of image and text representation spaces
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase