Vit So400m Patch16 Siglip 384.v2 Webli
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Downloads 2,073
Release Time : 2/21/2025
Model Overview
This model is the visual encoder part of SigLIP 2, using ViT architecture, suitable for image understanding and feature extraction tasks
Model Features
SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture, enhancing semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pre-training
Pre-trained on the large-scale webli dataset
Model Capabilities
Image Feature Extraction
Visual Semantic Understanding
Image Localization
Use Cases
Computer Vision
Image Retrieval
Uses extracted image features for similar image retrieval
Vision-Language Tasks
Serves as a visual encoder for multimodal tasks
Featured Recommended AI Models
Š 2025AIbase