Vit So400m Patch16 Siglip Gap 256.v2 Webli
ViT image encoder based on SigLIP 2, using global average pooling, with attention pooling head removed, suitable for image feature extraction tasks.
Downloads 22
Release Time : 2/21/2025
Model Overview
This model is a SigLIP 2 ViT (image encoder only) specifically designed for timm, using global average pooling (GAP) instead of an attention pooling head, primarily for image feature extraction tasks.
Model Features
SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture with better semantic understanding, localization, and dense feature extraction capabilities.
Global Average Pooling
Uses global average pooling (GAP) instead of an attention pooling head to simplify the model structure.
Multilingual Support
Trained on the webli dataset, capable of multilingual processing.
Model Capabilities
Image Feature Extraction
Semantic Understanding
Visual Localization
Use Cases
Computer Vision
Image Retrieval
Efficient image retrieval using extracted image features.
Visual Question Answering
Used as the image encoder part of vision-language models.
Featured Recommended AI Models
Š 2025AIbase