V

Vit Base Patch16 Siglip 384.v2 Webli

Developed by timm
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Downloads 330
Release Time : 2/21/2025

Model Overview

This is a SigLIP 2 Vision Transformer model, containing only the image encoder part, suitable for image feature extraction tasks. The model is based on ViT architecture and pre-trained using Sigmoid loss.

Model Features

SigLIP 2 Improvements
Based on SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pre-training
Pre-trained on the large-scale webli dataset

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Image Localization

Use Cases

Computer Vision
Image Retrieval
Using extracted image features for similar image retrieval
Visual Localization
Identifying and locating key regions in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase