C

CLIP Convnext Large D 320.laion2B S29b B131k Ft Soup

Developed by laion
CLIP model based on ConvNeXt-Large architecture, trained on LAION-2B dataset, supporting zero-shot image classification and image-text retrieval tasks
Downloads 83.56k
Release Time : 2/11/2023

Model Overview

This is a CLIP model based on ConvNeXt-Large architecture, trained using OpenCLIP framework on the LAION-2B dataset. The model supports tasks such as zero-shot image classification and image-text retrieval, with strong image understanding capabilities.

Model Features

High-resolution processing capability
Supports 320x320 resolution input, offering better detail handling compared to standard 256x256 models
Weight averaging optimization
Utilizes fine-tuned weight averaging (soup) technique to enhance model performance
Efficient architecture design
ConvNeXt-Large-D architecture is more efficient at 320x320 resolution compared to similar models

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal understanding
Image feature extraction

Use Cases

Image classification
Zero-shot image classification
Classify images without specific training
Achieves 76.9% zero-shot Top-1 accuracy on ImageNet-1k
Information retrieval
Image-text retrieval
Retrieve relevant images based on text queries or find related text based on images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase