V

Vit Gopt 16 SigLIP2 256

Developed by timm
SigLIP 2 vision-language model trained on WebLI dataset, suitable for zero-shot image classification tasks.
Downloads 43.20k
Release Time : 2/21/2025

Model Overview

This model is a contrastive image-text model primarily used for zero-shot image classification. It has been converted from Big Vision's original JAX checkpoint to a version compatible with OpenCLIP.

Model Features

SigLIP 2 Architecture
Utilizes an improved SigLIP 2 architecture with enhanced semantic understanding, localization, and dense feature extraction capabilities.
Multilingual Support
Supports multilingual text input (inferred from the paper).
Zero-shot Classification
Performs image classification tasks without fine-tuning.

Model Capabilities

Zero-shot Image Classification
Image-Text Contrastive Learning
Multilingual Understanding

Use Cases

Image Understanding
Zero-shot Image Classification
Classifies images without specific training
Examples show correct identification of foods like beignets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase