# High-precision image understanding
Clip Vitb16 Test Time Registers
A vision-language model based on the OpenCLIP-ViT-B-16 architecture. By introducing test-time registers to optimize the internal representation, it solves the problem of feature map artifacts.
Text-to-Image
Transformers

C
amildravid4292
517
0
Convnext Xxlarge.clip Laion2b Soup
Apache-2.0
ConvNeXt-XXLarge image encoder based on the CLIP framework, trained by LAION, suitable for multimodal tasks
Image Classification
Transformers

C
timm
220
0
CLIP Convnext Xxlarge Laion2b S34b B82k Augreg
MIT
CLIP ConvNeXt-XXLarge model trained on LAION-2B dataset, implemented with OpenCLIP framework, the first non-ViT architecture achieving >79% ImageNet zero-shot accuracy
Text-to-Image
C
laion
6,616
9
CLIP Convnext Xxlarge Laion2b S34b B82k Augreg Soup
MIT
CLIP ConvNeXt-XXLarge model trained on LAION-2B dataset using OpenCLIP framework, the first non-ViT image tower CLIP model achieving >79% ImageNet top-1 zero-shot accuracy
Text-to-Image
C
laion
9,412
22
Featured Recommended AI Models