V

Vit L 16 SigLIP2 512

Developed by timm
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Downloads 147
Release Time : 2/21/2025

Model Overview

This is a contrastive image-text model using SigLIP 2 architecture, primarily for zero-shot image classification. Trained on WebLI dataset, it can understand semantic relationships between images and texts.

Model Features

SigLIP 2 Architecture
Utilizes improved SigLIP 2 architecture with better semantic understanding, localization and dense feature extraction capabilities
Zero-shot Learning
Supports zero-shot image classification without task-specific fine-tuning for new categories
Multilingual Support
Model supports multilingual text input (inferred from paper description)
Efficient Contrastive Learning
Uses Sigmoid loss function for vision-language pretraining to improve learning efficiency

Model Capabilities

Image-text contrastive learning
Zero-shot image classification
Multimodal feature extraction

Use Cases

Image Understanding
Zero-shot Image Classification
Classify images of new categories without training
Examples show accurate recognition of food categories like beignets
Multimodal Applications
Image-Text Matching
Calculate similarity between images and text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase