V

Vit B 32 SigLIP2 256

Developed by timm
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Downloads 691
Release Time : 2/21/2025

Model Overview

This is a contrastive image-text model specifically designed for zero-shot image classification tasks. It adopts the SigLIP 2 architecture, trained on the WebLI dataset, capable of understanding semantic relationships between images and text.

Model Features

SigLIP 2 Architecture
Utilizes the improved SigLIP 2 architecture with enhanced semantic understanding, localization, and dense feature extraction capabilities
Zero-shot Classification
Performs image classification on new categories without specific training
Multilingual Support
Supports multilingual text input (inferred from the paper)
Efficient Visual Encoding
Uses Vision Transformer architecture to efficiently encode image features

Model Capabilities

Zero-shot image classification
Image-text matching
Multimodal feature extraction

Use Cases

Image Understanding
Zero-shot Image Classification
Classifies images without training, supporting dynamic addition of new categories
Outputs probability distributions for various categories
Image Retrieval
Retrieves relevant images based on text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase