Vit So400m Patch16 Siglip Gap 512.v2 Webli
A ViT image encoder based on SigLIP 2, utilizing global average pooling, suitable for vision-language tasks.
Downloads 21
Release Time : 2/21/2025
Model Overview
This model is a SigLIP 2 ViT image encoder specifically designed for timm, with the attention pooling head removed and replaced by global average pooling. It is primarily used for image feature extraction and vision-language tasks.
Model Features
SigLIP 2 Architecture
Utilizes the SigLIP 2 architecture, featuring enhanced semantic understanding and localization capabilities.
Global Average Pooling
The attention pooling head is removed and replaced by global average pooling.
Large-scale Pretraining
Pretrained on the webli dataset, offering robust image feature extraction capabilities.
Model Capabilities
Image Feature Extraction
Vision-Language Task Processing
Use Cases
Computer Vision
Image Classification
Can be used for image classification tasks by extracting image features for categorization.
Vision-Language Tasks
Suitable for vision-language tasks such as image caption generation.
Featured Recommended AI Models
Š 2025AIbase