C

CLIP Convnext Xxlarge Laion2b S34b B82k Augreg Rewind

Developed by laion
A CLIP ConvNeXt-XXLarge model trained on the LAION-2B dataset, implemented using the OpenCLIP framework, focusing on zero-shot image classification tasks.
Downloads 63
Release Time : 2/26/2023

Model Overview

This is a large vision-language model combining ConvNeXt-XXLarge image encoder and text encoder, designed for zero-shot image classification and image-text retrieval tasks.

Model Features

Large-scale ConvNeXt architecture
Uses an 847M-parameter ConvNeXt-XXLarge as the image encoder, making it the largest pre-trained ConvNeXt model released.
High-performance zero-shot classification
Achieves 79.3% top-1 zero-shot accuracy on ImageNet-1k, with performance between ViT-g and ViT-G.
Efficient training
Utilizes large-scale distributed training with up to 1024 GPUs, achieving a global batch size of 81920-95744.

Model Capabilities

Zero-shot image classification
Image-text retrieval
Image feature extraction
Text feature extraction

Use Cases

Computer Vision
Image classification
Classify images without specific training
Achieves 79.3% accuracy on ImageNet-1k
Image-text retrieval
Search for relevant images based on text descriptions or generate descriptions from images
Research
Multimodal learning research
Used for studying representation learning in vision-language models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase