O

Owlv2 Large Patch14 Finetuned

Developed by google
OWLv2 is a zero-shot text-conditioned object detection model that can detect objects in images through text queries without requiring category-specific training data.
Downloads 1,434
Release Time : 10/14/2023

Model Overview

OWLv2 is a zero-shot text-conditioned object detection model based on the CLIP backbone network, capable of detecting objects in images using one or more text queries. It employs ViT-L/14 as the visual encoder, is trained with contrastive loss, and fine-tuned on standard detection datasets.

Model Features

Zero-shot detection capability
Detects objects in images through text queries without requiring category-specific training data.
Open-vocabulary classification
Supports detection of arbitrary class names by replacing fixed classification layer weights with text embeddings.
Multi-query detection
Supports simultaneous detection of different objects in images using one or more text queries.

Model Capabilities

Text-conditioned object detection
Open-vocabulary object recognition
Multi-category simultaneous detection

Use Cases

Computer vision research
Zero-shot object detection research
Used to study the model's detection capability on unseen categories.
Interdisciplinary applications
Special scenario object recognition
Performs object detection in specialized fields (e.g., medical, industrial) where training data is difficult to obtain.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase