V

Vit So400m Patch16 Siglip 384.v2 Webli

Developed by timm
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Downloads 2,073
Release Time : 2/21/2025

Model Overview

This model is the visual encoder part of SigLIP 2, using ViT architecture, suitable for image understanding and feature extraction tasks

Model Features

SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture, enhancing semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pre-training
Pre-trained on the large-scale webli dataset

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Image Localization

Use Cases

Computer Vision
Image Retrieval
Uses extracted image features for similar image retrieval
Vision-Language Tasks
Serves as a visual encoder for multimodal tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase