V

Vit Large Patch16 Siglip Gap 384.webli

Developed by timm
A vision Transformer model based on SigLIP, utilizing global average pooling, suitable for image feature extraction tasks.
Downloads 13
Release Time : 12/24/2024

Model Overview

This model is a vision Transformer architecture specifically designed for image feature extraction. It employs SigLIP (Sigmoid Loss for Language Image Pre-training) for pre-training and uses global average pooling (GAP) to extract image features.

Model Features

SigLIP Pre-training
Uses Sigmoid Loss for language-image pre-training, enhancing the model's feature extraction capability.
Global Average Pooling
Employs global average pooling (GAP) strategy for image feature extraction, simplifying the process.
Large Input Size
Supports large image inputs of 384x384 pixels, suitable for high-resolution image processing.

Model Capabilities

Image feature extraction
Visual representation learning

Use Cases

Computer Vision
Image Classification
Can be used for feature extraction in image classification tasks.
Image Retrieval
Extracts image features for similar image retrieval.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase