V

Vit So400m Patch14 Siglip 384.webli

Developed by timm
Vision Transformer model based on SigLIP architecture, containing only the image encoder part, utilizing raw attention pooling mechanism
Downloads 9,429
Release Time : 12/24/2024

Model Overview

This model is a visual encoder implementation of the SigLIP (Sigmoid Loss for Language-Image Pre-training) architecture, focusing on image feature extraction tasks, suitable for scenarios requiring efficient visual representation

Model Features

Efficient Image Encoding
Focuses on image feature extraction, providing efficient visual representation
Raw Attention Pooling
Utilizes raw attention mechanism for feature pooling, preserving more image details
SigLIP Architecture
Based on Sigmoid loss-optimized language-image pretraining architecture

Model Capabilities

Image feature extraction
Visual representation learning

Use Cases

Computer Vision
Image Retrieval
Extracts image features for similar image search
Visual Content Understanding
Provides high-quality visual representation for downstream tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase