C

CLIP Convnext Base W Laion Aesthetic S13b B82k

Developed by laion
CLIP model with ConvNeXt-Base architecture trained on the LAION-Aesthetic dataset, supporting zero-shot image classification and cross-modal retrieval tasks
Downloads 703
Release Time : 1/3/2023

Model Overview

This model is a CLIP model with ConvNeXt-Base architecture trained using the OpenCLIP framework on the LAION-Aesthetic dataset, exploring the potential of ConvNeXt as an alternative architecture to ViT and ResNet, demonstrating excellent performance in image classification tasks.

Model Features

ConvNeXt Architecture Innovation
One of the first large-scale trained ConvNeXt CLIP models, exploring the potential of this architecture in CLIP tasks
Enhanced Training Strategy
Utilizes augmentation techniques such as random resized crops, random erasing, and stochastic depth to improve model performance
High Sample Efficiency
Achieves over 71% zero-shot accuracy on ImageNet with 13 billion training samples, outperforming ViT-B/16 under the same sample size
Multi-Resolution Support
Provides versions with 256x256 and 320x320 resolutions to accommodate different application scenarios

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal retrieval
Image feature extraction

Use Cases

Content Retrieval
Image Search Engine
Retrieve relevant images based on text queries
Reverse Image Search
Find similar or related images based on image content
Classification Systems
Zero-shot Classification
Classify new categories without fine-tuning
71.0% accuracy on ImageNet-1k
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase