Vit Large Patch16 Siglip Gap 384.webli
A vision Transformer model based on SigLIP, utilizing global average pooling, suitable for image feature extraction tasks.
Downloads 13
Release Time : 12/24/2024
Model Overview
This model is a vision Transformer architecture specifically designed for image feature extraction. It employs SigLIP (Sigmoid Loss for Language Image Pre-training) for pre-training and uses global average pooling (GAP) to extract image features.
Model Features
SigLIP Pre-training
Uses Sigmoid Loss for language-image pre-training, enhancing the model's feature extraction capability.
Global Average Pooling
Employs global average pooling (GAP) strategy for image feature extraction, simplifying the process.
Large Input Size
Supports large image inputs of 384x384 pixels, suitable for high-resolution image processing.
Model Capabilities
Image feature extraction
Visual representation learning
Use Cases
Computer Vision
Image Classification
Can be used for feature extraction in image classification tasks.
Image Retrieval
Extracts image features for similar image retrieval.
Featured Recommended AI Models