Model Selection

High-precision image understanding

# High-precision image understanding

Clip Vitb16 Test Time Registers

A vision-language model based on the OpenCLIP-ViT-B-16 architecture. By introducing test-time registers to optimize the internal representation, it solves the problem of feature map artifacts.

Convnext Xxlarge.clip Laion2b Soup

ConvNeXt-XXLarge image encoder based on the CLIP framework, trained by LAION, suitable for multimodal tasks

Image Classification

CLIP Convnext Xxlarge Laion2b S34b B82k Augreg

CLIP ConvNeXt-XXLarge model trained on LAION-2B dataset, implemented with OpenCLIP framework, the first non-ViT architecture achieving >79% ImageNet zero-shot accuracy

CLIP Convnext Xxlarge Laion2b S34b B82k Augreg Soup

CLIP ConvNeXt-XXLarge model trained on LAION-2B dataset using OpenCLIP framework, the first non-ViT image tower CLIP model achieving >79% ImageNet top-1 zero-shot accuracy

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase