AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Image Encoding

# Multimodal Image Encoding

Vit So400m Patch14 Siglip Gap 384.webli
Apache-2.0
Vision Transformer model based on SigLIP, utilizing global average pooling for image features
Image Classification Transformers
V
timm
96
0
Vit Giant Patch14 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained on the laion2B dataset
Image Classification Transformers
V
timm
71
0
Vit Base Patch16 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, containing only the image encoder part, suitable for image feature extraction tasks
Image Classification Transformers
V
timm
4,460
0
Convnext Large Mlp.clip Laion2b Ft Soup 320
Apache-2.0
ConvNeXt-Large image encoder based on CLIP architecture, fine-tuned on the LAION-2B dataset, supporting 320x320 resolution image feature extraction
Image Classification Transformers
C
timm
173
0
Convnext Large Mlp.clip Laion2b Augreg
Apache-2.0
ConvNeXt-Large image encoder based on the CLIP framework, trained on the LAION-2B dataset, supports visual feature extraction
Image Classification Transformers
C
timm
107
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase