V

Vit B 16 SigLIP2 384

Developed by timm
SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks
Downloads 1,497
Release Time : 2/21/2025

Model Overview

This is a contrastive image-text model pre-trained with a Sigmoid loss function, supporting multilingual vision-language encoding with improved semantic understanding and localization capabilities

Model Features

Sigmoid loss function
Utilizes an innovative Sigmoid loss function for language-image pre-training, enhancing model performance
Improved semantic understanding
Offers better semantic understanding and localization capabilities compared to previous models
Dense feature extraction
Capable of extracting dense features from images, suitable for more complex visual tasks
Multilingual support
Supports multilingual vision-language encoding

Model Capabilities

Zero-shot image classification
Image semantic understanding
Image-text contrastive learning
Multilingual vision-language encoding

Use Cases

Computer vision
Zero-shot image classification
Classify images without specific training
Example shows high accuracy in recognizing beignets
Visual semantic understanding
Understand semantic content within images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase