Siglip2 Large Patch16 384
SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.
Downloads 6,525
Release Time : 2/17/2025
Model Overview
SigLIP 2 is a vision-language model that can be used for tasks such as zero-shot image classification and image-text retrieval, or as a visual encoder for other vision tasks.
Model Features
Unified Training Scheme
Integrates multiple techniques such as decoder loss, global-local and masked prediction loss to form a unified training scheme.
Adaptive Training
Supports aspect ratio and resolution adaptive training.
Multi-task Capability
Simultaneously possesses semantic understanding, localization, and dense feature extraction capabilities.
Model Capabilities
Zero-shot Image Classification
Image-Text Retrieval
Visual Feature Extraction
Use Cases
Image Understanding
Zero-shot Image Classification
Classify images of new categories without specific training
Supports custom label classification
Visual Encoding
Serves as a visual encoder for other vision tasks
Provides high-quality image feature representations
Cross-modal Applications
Image-Text Retrieval
Achieves cross-modal retrieval between images and text
Featured Recommended AI Models