vit_so400m_patch14_siglip_gap_224.v2_webli Open-source Model - Highly Efficient for Image Feature Extraction Tasks

Vit So400m Patch14 Siglip Gap 224.v2 Webli

Developed by timm

A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.

Image Classification

Transformers

Open Source License:Apache-2.0 #Multimodal Visual Encoding #Global Average Pooling #Semantic Understanding Enhancement

Downloads 179

Release Time : 2/21/2025

Model Overview

This is a SigLIP 2 ViT image encoder specifically designed for timm, equivalent to the image tower portion of the ViT-SO400M-14-SigLIP2 model on HuggingFace. The gap variant replaces the attention pooling head with global average pooling.

Model Features

SigLIP 2 Architecture

Utilizes an improved SigLIP 2 architecture with enhanced semantic understanding, localization, and dense feature extraction capabilities.

Global Average Pooling

Uses global average pooling (gap) instead of the attention pooling head, simplifying the model structure.

Large-scale Pretraining

Pretrained on the webli dataset, offering robust visual representation capabilities.

Model Capabilities

Image Feature Extraction

Visual Semantic Understanding

Image Localization

Dense Feature Extraction

Use Cases

Computer Vision

Image Classification

Can serve as a feature extractor for image classification tasks.

Visual Question Answering

Provides image feature representations for visual question answering systems.

Multimodal Applications

Image-Text Matching

Used for image encoding in image-text matching tasks.

Property	Details
Dataset	webli
Papers	- SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features: https://arxiv.org/abs/2502.14786 - Sigmoid Loss for Language Image Pre-Training: https://arxiv.org/abs/2303.15343

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Vit So400m Patch14 Siglip Gap 224.v2 Webli

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Model card for vit_so400m_patch14_siglip_gap_224.v2_webli

🚀 Quick Start

✨ Features

📚 Documentation

🔍 Model Details

📄 Citation