Open-source SigLIP Model - Free Deployment for Zero-shot Image Classification Tasks!

Home

Siglip Base Patch16 224

Developed by Xenova

SigLIP is a vision-language pre-trained model suitable for zero-shot image classification tasks.

Text-to-Image

Transformers

#Zero-shot Image Classification #Multimodal Embedding #ONNX Compatible

Downloads 182

Release Time : 12/23/2023

Model Overview

SigLIP is a pre-trained model that combines visual and linguistic information, primarily used for zero-shot image classification tasks, capable of classifying images based on textual descriptions.

Model Features

Zero-shot Image Classification

Classify images based on textual descriptions without the need for training.

Vision-Language Pre-training

Combines visual and linguistic information for pre-training, enhancing the model's multimodal understanding capabilities.

ONNX Compatible

Supports ONNX format, facilitating deployment and usage on the web.

Model Capabilities

Zero-shot Image Classification

Text Embedding Vector Calculation

Visual Embedding Vector Calculation

Use Cases

Image Classification

Animal Recognition

Identify the type of animal in an image, such as cats, dogs, etc.

Can accurately identify the type of animal in an image.

Multimodal Applications

Image-Text Matching

Match images with textual descriptions for retrieval or classification.

Can effectively match images with textual descriptions.

🚀 Siglip Base Patch16-224 for Transformers.js

This project provides the ONNX weights of the google/siglip-base-patch16-224 model, making it compatible with the transformers.js library. It enables zero-shot image classification and the computation of text and vision embeddings in a JavaScript environment.

🚀 Quick Start

📦 Installation

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

💻 Usage Examples

🔍 Basic Usage - Zero-shot Image Classification

import { pipeline } from '@xenova/transformers';

const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-base-patch16-224');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
    hypothesis_template: 'a photo of {}',
});
console.log(output);
// [
//   { score: 0.16770583391189575, label: '2 cats' },
//   { score: 0.000022096000975579955, label: '2 dogs' }
// ]

📝 Advanced Usage - Compute Text Embeddings

import { AutoTokenizer, SiglipTextModel } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/siglip-base-patch16-224');
const text_model = await SiglipTextModel.from_pretrained('Xenova/siglip-base-patch16-224');

// Run tokenization
const texts = ['a photo of 2 cats', 'a photo of 2 dogs'];
const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });

// Compute embeddings
const { pooler_output } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 768 ],
//   type: 'float32',
//   data: Float32Array(1536) [ ... ],
//   size: 1536
// }

🖼️ Advanced Usage - Compute Vision Embeddings

import { AutoProcessor, SiglipVisionModel, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/siglip-base-patch16-224');
const vision_model = await SiglipVisionModel.from_pretrained('Xenova/siglip-base-patch16-224');

// Read image and run processor
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);

// Compute embeddings
const { pooler_output } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 768 ],
//   type: 'float32',
//   data: Float32Array(768) [ ... ],
//   size: 768
// }

⚠️ Important Note

Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).

Property	Details
Base Model	google/siglip-base-patch16-224
Library Name	transformers.js
Pipeline Tag	zero-shot-image-classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご