# Zero-shot Object Detection
Llmdet Swin Large Hf
Apache-2.0
LLMDet is a powerful open-vocabulary object detector supervised by large language models, a highlight paper at CVPR2025
Object Detection
L
fushh7
3,428
1
Llmdet Swin Base Hf
Apache-2.0
LLMDet is an open-vocabulary object detector supervised by large language models, capable of zero-shot object detection.
Object Detection
Safetensors
L
fushh7
605
0
VLM R1 Qwen2.5VL 3B OVD 0321
Apache-2.0
A zero-shot object detection model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, supporting open vocabulary detection tasks.
Text-to-Image
Safetensors English
V
omlab
892
11
Inference Endpoint For Omdet Turbo Swin Tiny Hf
Apache-2.0
A zero-shot object detection model based on the Swin-Tiny architecture, supporting French and English, suitable for various object detection tasks.
Object Detection
Transformers Supports Multiple Languages

I
Blueway
199
1
Yoloe 11l Seg
YOLOE is a real-time visual omni-model that supports various vision tasks including zero-shot object detection.
Object Detection
Y
jameslahm
219
2
Yoloe V8l Seg
YOLOE is a real-time visual omni-model that combines object detection and visual understanding capabilities, suitable for various visual tasks.
Object Detection
Y
jameslahm
4,135
1
Yoloe V8s Seg
YOLOE is a zero-shot object detection model capable of detecting various objects in visual scenes in real-time.
Object Detection
Y
jameslahm
28
0
Qwen2.5vl 3B VLM R1 REC 500steps
A vision-language model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, focusing on referring expression comprehension tasks.
Text-to-Image
Safetensors English
Q
omlab
976
22
Grounding Dino Tiny ONNX
Apache-2.0
A lightweight zero-shot object detection model in ONNX format, compatible with Transformers.js, suitable for browser-side deployment.
Object Detection
Transformers

G
onnx-community
98
1
Omdet Turbo Swin Tiny Hf
Apache-2.0
OmDet-Turbo is an efficient fusion-head open-vocabulary detection model based on real-time Transformer, suitable for zero-shot object detection tasks.
Object Detection
O
omlab
36.29k
33
Owlv2 Base Patch16
OWLv2 is a vision-language pre-trained model focused on object detection and localization tasks.
Object Detection
Transformers

O
Xenova
17
0
Owlv2 Base Patch16 Ensemble
Apache-2.0
OWLv2 is a zero-shot text-conditioned object detection model that can locate objects in images through text queries.
Object Detection
Transformers

O
upfeatmediainc
15
0
Owlv2 Large Patch14 Ensemble
Apache-2.0
OWLv2 is a zero-shot text-conditioned object detection model that can locate objects in images through text queries.
Text-to-Image
Transformers

O
google
262.77k
25
Owlv2 Large Patch14
Apache-2.0
OWLv2 is a zero-shot text-conditioned object detection model that can detect objects in images through text queries without requiring category-specific training data.
Text-to-Image
Transformers

O
google
3,679
5
Grounding Dino Base
Apache-2.0
Grounding DINO is an open-set object detection model that achieves zero-shot object detection capabilities by combining the DINO detector with a text encoder.
Object Detection
Transformers

G
IDEA-Research
1.1M
87
Owlvit Large Patch14
Apache-2.0
OWL-ViT is a zero-shot text-conditioned object detection model that can retrieve objects in images through text queries.
Text-to-Image
Transformers

O
google
25.01k
25
Owlvit Base Patch16
Apache-2.0
OWL-ViT is a zero-shot text-conditioned object detection model that can detect objects in images via text queries.
Text-to-Image
Transformers

O
google
4,588
12
Featured Recommended AI Models