Vit So400m Patch14 Siglip 378.webli
A vision Transformer model based on SigLIP, containing only an image encoder, utilizing the original attention pooling mechanism.
Downloads 82
Release Time : 12/24/2024
Model Overview
This model is a vision Transformer focused on image feature extraction, adopting the SigLIP architecture, suitable for various computer vision tasks.
Model Features
SigLIP Architecture
Adopts the SigLIP architecture, focusing on efficient image feature extraction.
Original Attention Pooling
Uses the original attention pooling mechanism to enhance feature extraction accuracy.
Large Model Scale
A large-scale model with 400M parameters capable of handling complex vision tasks.
Model Capabilities
Image feature extraction
Visual representation learning
Use Cases
Computer Vision
Image Classification
Can be used for image classification tasks to extract high-quality feature representations.
Object Detection
Serves as a feature extractor to support object detection tasks.
Featured Recommended AI Models