Vit So400m Patch14 Siglip 224.v2 Webli
A Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction and pretrained on the webli dataset.
Downloads 7,005
Release Time : 2/21/2025
Model Overview
This model is the visual encoder component of SigLIP 2, utilizing ViT architecture and suitable for image understanding and feature extraction tasks.
Model Features
SigLIP 2 Enhancements
Incorporates improved SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pretraining
Pretrained on the large-scale webli dataset
Model Capabilities
Image Feature Extraction
Visual Semantic Understanding
Image Localization
Use Cases
Computer Vision
Image Retrieval
Utilizes extracted image features for similar image retrieval
Visual Question Answering
Serves as a visual encoder for visual question answering systems
Multimodal Applications
Image-Text Matching
Works with text encoders to achieve image-text matching tasks
Featured Recommended AI Models
Š 2025AIbase