Vit Base Patch16 Siglip Gap 224.v2 Webli
Vision Transformer model based on SigLIP 2, utilizing global average pooling for image features
Downloads 303
Release Time : 2/21/2025
Model Overview
This is a SigLIP 2 ViT image encoder specifically designed for timm, which removes the attention pooling head and adopts global average pooling to extract image features.
Model Features
Global Average Pooling
Uses GAP (Global Average Pooling) instead of attention pooling head to simplify the feature extraction process
SigLIP 2 Improvements
Based on the SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of generating high-quality dense image feature representations
Model Capabilities
Image Feature Extraction
Visual Semantic Understanding
Multimodal Task Support
Use Cases
Computer Vision
Image Retrieval
Utilizes extracted image features for similar image search
Multimodal Tasks
Serves as a visual encoder for vision-language joint tasks
Featured Recommended AI Models
Š 2025AIbase