V

Vit So400m Patch14 Siglip 224.webli

Developed by timm
Vision Transformer model based on SigLIP, containing only the image encoder part, utilizing original attention pooling mechanism
Downloads 123
Release Time : 12/24/2024

Model Overview

This is a Vision Transformer model based on the SigLIP architecture, specifically designed for image feature extraction tasks. The model uses 14x14 patch size and 224x224 input resolution.

Model Features

SigLIP Attention Pooling
Utilizes the unique attention pooling mechanism of the SigLIP architecture to optimize image feature extraction
Large Model Scale
400M-parameter large-scale vision model capable of capturing richer image features
High-Resolution Processing
Supports 224x224 input resolution, suitable for processing images with rich details

Model Capabilities

Image feature extraction
Visual representation learning

Use Cases

Computer Vision
Image Classification
Can serve as the base feature extractor for image classification tasks
Visual Search
Used as the feature extraction component for building visual search engines
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase