Siglip2 Base Patch16 224
SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.
Downloads 44.75k
Release Time : 2/17/2025
Model Overview
SigLIP 2 is a vision-language model that can be used for tasks such as zero-shot image classification and image-text retrieval, and can also serve as a visual encoder for other vision tasks.
Model Features
Improved Training Objectives
Integrates decoder loss, global-local and masked prediction loss, aspect ratio and resolution adaptability, and other training objectives.
Multi-task Capability
Supports various vision-language tasks such as zero-shot image classification and image-text retrieval.
Large-scale Pretraining
Pretrained on the WebLI dataset using up to 2048 TPU-v5e chips.
Model Capabilities
Zero-shot Image Classification
Image-Text Retrieval
Visual Feature Extraction
Use Cases
Image Analysis
Zero-shot Image Classification
Classify images without specific training.
Can accurately classify images based on provided candidate labels.
Visual Feature Extraction
Extract visual feature representations of images.
Can be used for downstream vision tasks.
Featured Recommended AI Models