O

Owlvit Base Patch16

Developed by google
OWL-ViT is a zero-shot text-conditioned object detection model that can detect objects in images via text queries.
Downloads 4,588
Release Time : 7/5/2022

Model Overview

OWL-ViT is a zero-shot text-conditioned object detection model based on a CLIP backbone, capable of detecting objects in images using one or more text queries without requiring training on specific categories.

Model Features

Zero-shot Detection Capability
Can detect new objects via text queries without training on specific categories
Multi-text Query Support
Supports detecting different objects in an image simultaneously using one or more text queries
Open-vocabulary Classification
Achieves open-vocabulary classification by replacing fixed classification layer weights with text embeddings

Model Capabilities

Zero-shot text-conditioned object detection
Image object localization
Multi-category simultaneous detection

Use Cases

Computer Vision Research
Zero-shot Object Detection Research
Used to study the model's detection capability on unseen categories
Interdisciplinary Applications
Special Object Recognition
Applied in domains requiring recognition of objects with unavailable labels during training
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase