Open-source owlvit-base-patch32 model - A zero-shot object detection tool that can detect new categories of objects without fine-tuning

Owlvit Base Patch32

Developed by Xenova

OWL-ViT is a zero-shot object detection model based on Vision Transformer, capable of detecting objects of new categories without fine-tuning.

Object Detection

Transformers

#Zero-shot object detection #ONNX format #Web adaptation

Downloads 86

Release Time : 11/13/2023

Model Overview

This model is a zero-shot object detection model based on the Transformer architecture, which can detect objects in images based on provided text labels without training for specific categories.

Model Features

Zero-shot detection capability

Can detect objects of new categories without training for specific classes.

Text-guided detection

Specifies the object categories to be detected through text descriptions.

Transformer-based architecture

Adopts the Vision Transformer architecture, combining text and image information.

Web adaptation

Provides ONNX format weights for easy use in browser environments.

Model Capabilities

Zero-shot object detection

Multi-category object recognition

Text-guided image analysis

Bounding box prediction

Use Cases

Image analysis

Object detection

Detects specified categories of objects in images.

Returns detected object categories, confidence scores, and bounding box coordinates.

Content moderation

Sensitive content detection

Detects the presence of specific types of sensitive content in images.

🚀 Zero-shot Object Detection with Transformers.js

This project uses the google/owlvit-base-patch32 model with ONNX weights to be compatible with the transformers.js library. It enables zero-shot object detection tasks.

🚀 Quick Start

📦 Installation

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

💻 Usage Examples

🔍 Basic Usage

Example: Zero-shot object detection w/ Xenova/owlvit-base-patch32.

import { pipeline } from '@xenova/transformers';

const detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
const candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
const output = await detector(url, candidate_labels);
// [
//   { score: 0.24392342567443848, label: 'human face', box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 } },
//   { score: 0.15129457414150238, label: 'american flag', box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 } },
//   { score: 0.13649864494800568, label: 'helmet', box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 } },
//   { score: 0.10262022167444229, label: 'rocket', box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 } }
// ]

image/png

🌟 Advanced Usage

Example: Zero-shot object detection w/ Xenova/owlvit-base-patch32 (additional parameters).

import { pipeline } from '@xenova/transformers';

const detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/beach.png';
const candidate_labels = ['hat', 'book', 'sunglasses', 'camera'];
const output = await detector(url, candidate_labels, { topk: 4, threshold: 0.05 });
// [
//   { score: 0.1606510728597641, label: 'sunglasses', box: { xmin: 347, ymin: 229, xmax: 429, ymax: 264 } },
//   { score: 0.08935828506946564, label: 'hat', box: { xmin: 38, ymin: 174, xmax: 258, ymax: 364 } },
//   { score: 0.08530698716640472, label: 'camera', box: { xmin: 187, ymin: 350, xmax: 260, ymax: 411 } },
//   { score: 0.08349756896495819, label: 'book', box: { xmin: 261, ymin: 280, xmax: 494, ymax: 425 } }
// ]

image/png

📚 Documentation

⚠️ Important Note

Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).

📄 Model Information

Property	Details
Base Model	google/owlvit-base-patch32
Library Name	transformers.js
Pipeline Tag	zero-shot-object-detection

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご