Vit Giantopt Patch16 Siglip Gap 384.v2 Webli
A ViT image encoder based on SigLIP 2, utilizing global average pooling and removing the attention pooling head, suitable for image feature extraction tasks.
Downloads 21
Release Time : 2/21/2025
Model Overview
This model is a SigLIP 2 ViT image encoder specifically designed for timm, primarily used for image feature extraction. It is equivalent to the image tower part of the ViT-gopt-16-SigLIP2-384 model on HuggingFace but adopts the global average pooling (gap) variant.
Model Features
SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture with better semantic understanding and localization capabilities
Global Average Pooling
Employs the global average pooling (gap) variant, removing the attention pooling head
WebLI Dataset Training
Pretrained on the WebLI dataset, offering broad visual representation capabilities
Model Capabilities
Image Feature Extraction
Visual Semantic Understanding
Image Localization
Use Cases
Computer Vision
Image Retrieval
Uses extracted image features for similar image retrieval
Visual Question Answering
Serves as a visual encoder for visual question answering systems
Multimodal Applications
Image-Text Matching
Used for image and text matching tasks
Featured Recommended AI Models
Š 2025AIbase