Owlvit Base Patch32
OWL-ViT is a zero-shot object detection model based on Vision Transformer, capable of detecting objects of new categories without fine-tuning.
Downloads 86
Release Time : 11/13/2023
Model Overview
This model is a zero-shot object detection model based on the Transformer architecture, which can detect objects in images based on provided text labels without training for specific categories.
Model Features
Zero-shot detection capability
Can detect objects of new categories without training for specific classes.
Text-guided detection
Specifies the object categories to be detected through text descriptions.
Transformer-based architecture
Adopts the Vision Transformer architecture, combining text and image information.
Web adaptation
Provides ONNX format weights for easy use in browser environments.
Model Capabilities
Zero-shot object detection
Multi-category object recognition
Text-guided image analysis
Bounding box prediction
Use Cases
Image analysis
Object detection
Detects specified categories of objects in images.
Returns detected object categories, confidence scores, and bounding box coordinates.
Content moderation
Sensitive content detection
Detects the presence of specific types of sensitive content in images.
Featured Recommended AI Models
Š 2025AIbase