V

Vit SO400M 16 SigLIP2 384

Developed by timm
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks.
Downloads 106.30k
Release Time : 2/21/2025

Model Overview

This is a contrastive image-text model specifically designed for zero-shot image classification, capable of understanding semantic relationships between images and text.

Model Features

Zero-shot classification capability
Can classify images into new categories without specific training
Improved semantic understanding
SigLIP 2 architecture provides better semantic understanding and localization capabilities
Dense feature extraction
Capable of extracting dense feature representations from images

Model Capabilities

Zero-shot image classification
Image-text semantic matching
Multimodal feature extraction

Use Cases

Image understanding
Food recognition
Identifying various food categories such as donuts, beignets, etc.
Example shows highest probability for correctly identifying beignets
Animal recognition
Distinguishing between different animal categories like cats, dogs, etc.
Content moderation
Inappropriate content detection
Identifying potentially inappropriate content in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase