# Multimodal Image Encoding
Vit So400m Patch14 Siglip Gap 384.webli
Apache-2.0
Vision Transformer model based on SigLIP, utilizing global average pooling for image features
Image Classification
Transformers

V
timm
96
0
Vit Giant Patch14 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained on the laion2B dataset
Image Classification
Transformers

V
timm
71
0
Vit Base Patch16 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, containing only the image encoder part, suitable for image feature extraction tasks
Image Classification
Transformers

V
timm
4,460
0
Convnext Large Mlp.clip Laion2b Ft Soup 320
Apache-2.0
ConvNeXt-Large image encoder based on CLIP architecture, fine-tuned on the LAION-2B dataset, supporting 320x320 resolution image feature extraction
Image Classification
Transformers

C
timm
173
0
Convnext Large Mlp.clip Laion2b Augreg
Apache-2.0
ConvNeXt-Large image encoder based on the CLIP framework, trained on the LAION-2B dataset, supports visual feature extraction
Image Classification
Transformers

C
timm
107
0
Featured Recommended AI Models