đ CLIP-ViT-Base-Patch16 with Transformers.js
This project provides the ONNX weights of openai/clip-vit-base-patch16 to make it compatible with Transformers.js, enabling seamless use in JavaScript environments.
đ Quick Start
đĻ Installation
If you haven't already, you can install the Transformers.js JavaScript library from NPM using the following command:
npm i @xenova/transformers
đģ Usage Examples
đ Basic Usage
Perform zero-shot image classification with the pipeline
API
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/clip-vit-base-patch16');
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
const output = await classifier(url, ['tiger', 'horse', 'dog']);
âī¸ Advanced Usage
Perform zero-shot image classification with CLIPModel
import { AutoTokenizer, AutoProcessor, CLIPModel, RawImage } from '@xenova/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const model = await CLIPModel.from_pretrained('Xenova/clip-vit-base-patch16');
const texts = ['a photo of a car', 'a photo of a football match'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);
const output = await model({ ...text_inputs, ...image_inputs });
Compute text embeddings with CLIPTextModelWithProjection
import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
const texts = ['a photo of a car', 'a photo of a football match'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
const { text_embeds } = await text_model(text_inputs);
Compute vision embeddings with CLIPVisionModelWithProjection
import { AutoProcessor, CLIPVisionModelWithProjection, RawImage } from '@xenova/transformers';
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);
const { image_embeds } = await vision_model(image_inputs);
â ī¸ Important Note
Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using đ¤ Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx
).