O

Owlvit Base Patch32

Developed by Xenova
OWL-ViT is a zero-shot object detection model based on Vision Transformer, capable of detecting objects of new categories without fine-tuning.
Downloads 86
Release Time : 11/13/2023

Model Overview

This model is a zero-shot object detection model based on the Transformer architecture, which can detect objects in images based on provided text labels without training for specific categories.

Model Features

Zero-shot detection capability
Can detect objects of new categories without training for specific classes.
Text-guided detection
Specifies the object categories to be detected through text descriptions.
Transformer-based architecture
Adopts the Vision Transformer architecture, combining text and image information.
Web adaptation
Provides ONNX format weights for easy use in browser environments.

Model Capabilities

Zero-shot object detection
Multi-category object recognition
Text-guided image analysis
Bounding box prediction

Use Cases

Image analysis
Object detection
Detects specified categories of objects in images.
Returns detected object categories, confidence scores, and bounding box coordinates.
Content moderation
Sensitive content detection
Detects the presence of specific types of sensitive content in images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase