C

CLIP Convnext Large D 320.laion2B S29b B131k Ft

Developed by laion
CLIP model based on ConvNeXt-Large architecture, trained on LAION-2B dataset, supporting zero-shot image classification and image-text retrieval tasks.
Downloads 3,810
Release Time : 2/11/2023

Model Overview

This model uses ConvNeXt-Large as the visual encoder with additional text depth and visual MLP heads, fine-tuned at 320x320 resolution, suitable for zero-shot image classification and cross-modal retrieval tasks.

Model Features

High-resolution processing capability
Fine-tuned at 320x320 resolution, more efficient than similar models with lower computational resource consumption.
Enhanced visual MLP head
The vision tower uses an MLP (fc-gelu-drop-fc) head instead of a single projection, improving feature representation.
Large-scale training data
Trained on the LAION-2B dataset (2 billion English samples), covering a wide range of visual concepts.

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal representation learning

Use Cases

Image understanding
Zero-shot image classification
Classify images of new categories without fine-tuning
Achieves 76.6% zero-shot Top-1 accuracy on ImageNet-1k
Cross-modal retrieval
Image-text retrieval system
Build an image retrieval system based on natural language queries
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase