Vit SO400M 16 SigLIP2 512
SigLIP 2 vision-language model trained on WebLI dataset, suitable for zero-shot image classification tasks
Downloads 1,191
Release Time : 2/21/2025
Model Overview
This is a contrastive image-text model using SigLIP 2 architecture, with improved semantic understanding and localization capabilities, supporting multilingual vision-language encoding
Model Features
Improved semantic understanding
Adopts SigLIP 2 architecture, offering better semantic understanding compared to previous models
Multilingual support
Supports multilingual vision-language encoding, capable of processing text inputs in different languages
Zero-shot classification capability
Can classify images into new categories without specific training
Dense feature extraction
Capable of extracting dense image features, supporting finer-grained image understanding
Model Capabilities
Zero-shot image classification
Image-text matching
Multimodal feature extraction
Cross-modal retrieval
Use Cases
Image understanding
Zero-shot image classification
Classify images into new categories without specific training
Can accurately identify object categories in images
Image retrieval
Retrieve relevant images based on text descriptions
Enables efficient cross-modal retrieval
Multimodal applications
Image-text matching
Evaluate the matching degree between images and text descriptions
Applicable to scenarios like content moderation and ad matching
Featured Recommended AI Models
Š 2025AIbase