V

Vit B 16 SigLIP 384

Developed by timm
SigLIP (Sigmoid Loss Language-Image Pretraining) model trained on the WebLI dataset for zero-shot image classification tasks
Downloads 4,119
Release Time : 10/16/2023

Model Overview

This model is a contrastive image-text model pretrained using a Sigmoid loss function, suitable for zero-shot image classification tasks. It is based on the ViT-B-16 architecture and trained on the WebLI dataset.

Model Features

Sigmoid loss function
Uses an innovative Sigmoid loss function for language-image pretraining, demonstrating better performance compared to traditional Softmax loss
Zero-shot learning capability
Can classify images into new categories without requiring specific category training
High-resolution input
Supports high-resolution image input at 384x384 pixels
Multi-framework support
Simultaneously supports OpenCLIP (image+text) and timm (image-only) frameworks

Model Capabilities

Zero-shot image classification
Image-text matching
Image feature extraction
Multimodal understanding

Use Cases

Content classification
Social media image classification
Automatically classify and tag images on social media
Can accurately identify objects, scenes, and activities in images
E-commerce
Product image classification
Automatically classify product images on e-commerce platforms
No need to train separate models for each product category
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase