V

Vit Base Patch16 Siglip Gap 256.v2 Webli

Developed by timm
A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction.
Downloads 114
Release Time : 2/21/2025

Model Overview

This model is a SigLIP 2 ViT image encoder specifically designed for timm, primarily used for image feature extraction tasks. It is trained on the Webli dataset and adopts a global average pooling strategy, removing the attention pooling head.

Model Features

SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture with enhanced semantic understanding and localization capabilities.
Global Average Pooling
Replaces the attention pooling head with global average pooling (GAP), simplifying the model structure.
Webli Dataset Training
Pre-trained on the large-scale Webli dataset.

Model Capabilities

Image Feature Extraction
Visual Semantic Understanding
Image Localization

Use Cases

Computer Vision
Image Retrieval
Uses extracted image features for similar image retrieval.
Visual Question Answering
Serves as the image encoder component in vision-language models.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase