C

CLIP Convnext Base W 320 Laion Aesthetic S13b B82k Augreg

Developed by laion
CLIP model based on ConvNeXt-Base architecture, trained on the LAION-5B aesthetic subset, supporting 320x320 resolution image classification
Downloads 4,430
Release Time : 1/10/2023

Model Overview

This model is part of the OpenCLIP project, utilizing ConvNeXt-Base as the image encoder and specifically optimized for zero-shot image classification tasks. Trained on the aesthetic subset of LAION-5B, it incorporates enhanced image data augmentation strategies.

Model Features

ConvNeXt Architecture Innovation
The first large-scale CLIP model to adopt the ConvNeXt architecture, exploring alternatives to traditional ViT and ResNet solutions.
Enhanced Data Augmentation Strategy
Utilizes extended RRC random cropping range, random erasing, and random depth techniques to improve model regularization.
High-Resolution Support
Supports 320x320 resolution input while maintaining strong performance at higher resolutions.
Trained on Aesthetic Dataset
Trained on a curated subset of LAION-5B filtered by aesthetic scores, enhancing recognition capabilities for high-quality images.

Model Capabilities

Zero-shot Image Classification
Image-Text Retrieval
Image Feature Extraction

Use Cases

Image Understanding
Open-Domain Image Classification
Classifies arbitrary images without specific training.
Achieves 71.3% zero-shot accuracy on ImageNet-1k.
Image-Text Matching
Enables cross-modal matching between images and text descriptions.
Research Applications
Multimodal Model Research
Serves as a foundational model for vision-language joint representation learning research.
Featured Recommended AI Models
ยฉ 2025AIbase