AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
SigLIP Vision Encoder

# SigLIP Vision Encoder

Vit So400m Patch14 Siglip Gap 448.pali Mix
Apache-2.0
A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.
Text-to-Image Transformers
V
timm
15
0
Vit Large Patch16 Siglip 384.webli
Apache-2.0
A vision Transformer model based on SigLIP, containing only the image encoder, using original attention pooling, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
64
0
Vit Base Patch16 Siglip 384.webli
Apache-2.0
Vision Transformer model based on SigLIP, containing only the image encoder part, using original attention pooling mechanism
Image Classification Transformers
V
timm
64
1
Vit So400m Patch14 Siglip 224.webli
Apache-2.0
Vision Transformer model based on SigLIP, containing only the image encoder part, utilizing original attention pooling mechanism
Image Classification Transformers
V
timm
123
1
Nanollava 1.5
Apache-2.0
nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.
Image-to-Text Transformers English
N
qnguyen3
442
109
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase