V

Vit Base Patch16 Siglip Gap 224.v2 Webli

Developed by timm
Vision Transformer model based on SigLIP 2, utilizing global average pooling for image features
Downloads 303
Release Time : 2/21/2025

Model Overview

This is a SigLIP 2 ViT image encoder specifically designed for timm, which removes the attention pooling head and adopts global average pooling to extract image features.

Model Features

Global Average Pooling
Uses GAP (Global Average Pooling) instead of attention pooling head to simplify the feature extraction process
SigLIP 2 Improvements
Based on the SigLIP 2 architecture with enhanced semantic understanding and localization capabilities
Dense Feature Extraction
Capable of generating high-quality dense image feature representations

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Multimodal Task Support

Use Cases

Computer Vision
Image Retrieval
Utilizes extracted image features for similar image search
Multimodal Tasks
Serves as a visual encoder for vision-language joint tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase