Siglip2 So400m Patch14 224
SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.
Downloads 23.11k
Release Time : 2/17/2025
Model Overview
SigLIP 2 is a vision-language model that can be used for zero-shot image classification, image-text retrieval, and other tasks, or as a visual encoder for other vision tasks.
Model Features
Improved semantic understanding
Incorporates multiple techniques to enhance the model's semantic understanding capabilities.
Enhanced localization ability
Improves the model's localization ability through global-local and masked prediction losses.
Dense feature extraction
Capable of extracting dense features from images, suitable for various vision tasks.
Aspect ratio and resolution adaptability
Supports input images with different aspect ratios and resolutions.
Model Capabilities
Zero-shot image classification
Image-text retrieval
Visual encoding
Use Cases
Image classification
Zero-shot image classification
Classify images without training, supporting custom labels.
Performs excellently on various datasets.
Image-text retrieval
Image-text matching
Retrieve relevant images based on text descriptions or generate relevant text descriptions based on images.
Featured Recommended AI Models