Vit Base Patch16 Siglip 384.v2 Webli
V
Vit Base Patch16 Siglip 384.v2 Webli
Developed by timm
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Downloads 330
Release Time : 2/21/2025
Model Overview
This is a SigLIP 2 Vision Transformer model, containing only the image encoder part, suitable for image feature extraction tasks. The model is based on ViT architecture and pre-trained using Sigmoid loss.
Model Features
SigLIP 2 Improvements
Based on SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pre-training
Pre-trained on the large-scale webli dataset
Model Capabilities
Image Feature Extraction
Visual Semantic Understanding
Image Localization
Use Cases
Computer Vision
Image Retrieval
Using extracted image features for similar image retrieval
Visual Localization
Identifying and locating key regions in images
Featured Recommended AI Models