C

CLIP Convnext Base W 320 Laion Aesthetic S13b B82k

Developed by laion
A CLIP model based on the ConvNeXt-Base architecture, trained on a subset of LAION-5B, suitable for zero-shot image classification and image-text retrieval tasks.
Downloads 12.67k
Release Time : 1/3/2023

Model Overview

This is a CLIP model based on the ConvNeXt-Base architecture, trained on a subset of LAION-5B using OpenCLIP. The model explores architectures as alternatives to ViT and ResNet and has good scalability in terms of model size and image resolution.

Model Features

ConvNeXt architecture
The first large-scale trained ConvNeXt CLIP model, exploring the architectural possibilities as alternatives to ViT and ResNet
Augmentation and regularization
Adopt augmentation and regularization techniques such as random resized cropping, random erasing, and stochastic depth to improve model performance
High-resolution training
Some models are trained at a high resolution of 320x320 to improve image recognition accuracy
High sample efficiency
Achieves higher accuracy with fewer training samples compared to the ViT-B/16 model

Model Capabilities

Zero-shot image classification
Image-text retrieval
Image feature extraction
Text feature extraction

Use Cases

Computer Vision
Image classification
Classify images without fine-tuning
ImageNet zero-shot top-1 accuracy of 71.7%
Image retrieval
Retrieve relevant images based on text descriptions
Multimodal research
Vision-language alignment
Study the alignment relationship between image and text representations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase