V

Vit So400m Patch16 Siglip Gap 384.v2 Webli

Developed by timm
A ViT image encoder based on SigLIP 2, utilizing global average pooling, with the attention pooling head removed, suitable for image feature extraction tasks.
Downloads 19
Release Time : 2/21/2025

Model Overview

This model is a SigLIP 2 ViT image encoder specifically designed for timm, primarily used for image feature extraction. It is trained on the Webli dataset and employs global average pooling (GAP) instead of an attention pooling head.

Model Features

SigLIP 2 Architecture
Utilizes the improved SigLIP 2 architecture, offering better semantic understanding, localization, and dense feature extraction capabilities
Global Average Pooling
Replaces the attention pooling head with global average pooling (GAP), simplifying the model structure
Large-scale Pretraining
Pretrained on the large-scale Webli dataset

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Dense Feature Extraction

Use Cases

Computer Vision
Image Retrieval
Uses extracted image features for similar image retrieval
Visual Localization
Identifies and understands specific regions and objects in images
Multimodal Applications
Vision-Language Tasks
Serves as a visual encoder for joint vision-language tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase