V

Vit So400m Patch14 Siglip 378.v2 Webli

Developed by timm
Vision Transformer model based on SigLIP 2, designed for image feature extraction, trained on the webli dataset
Downloads 30
Release Time : 2/21/2025

Model Overview

This is a SigLIP 2 architecture-based Vision Transformer model, containing only the image encoder part, suitable for image feature extraction tasks. The model is implemented based on the timm library, functionally equivalent to the image tower module of the ViT-SO400M-14-SigLIP2-378 model on HuggingFace.

Model Features

SigLIP 2 Architecture
Utilizes the improved SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of extracting dense feature representations from images
Large-scale Pretraining
Pretrained on the large-scale webli dataset

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Image Localization

Use Cases

Computer Vision
Image Retrieval
Utilizes extracted image features for similar image retrieval
Visual Localization
Identifies and locates specific objects or regions in images
Multimodal Applications
Vision-Language Tasks
Serves as a visual encoder for tasks like image-text matching
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase