VLM - R1 - Qwen2.5VL - 3B - OVD - 0321 Open-source Object Detection Model, Supports Open-vocabulary Detection Tasks!

VLM R1 Qwen2.5VL 3B OVD 0321

Developed by omlab

A zero-shot object detection model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, supporting open vocabulary detection tasks.

Text-to-Image

Safetensors

EnglishOpen Source License:Apache-2.0 #Zero-shot Object Detection #VLM-R1 Reinforcement Learning #Multimodal Vision-Language

Downloads 892

Release Time : 3/21/2025

Model Overview

This model combines vision-language models with reinforcement learning techniques, specifically designed for Open Vocabulary Detection (OVD), capable of recognizing new category objects not explicitly labeled in the training data.

Model Features

Reinforcement Learning Enhancement

Optimizes model performance using the VLM-R1 reinforcement learning algorithm

Open Vocabulary Detection

Supports recognizing new category objects not included in the training data

Multimodal Understanding

Combines visual and linguistic information for object detection

Model Capabilities

Zero-shot Object Detection

Open Vocabulary Recognition

Multimodal Understanding

Vision-Language Reasoning

Use Cases

Computer Vision

Smart Surveillance

Detects unknown category objects in surveillance footage

Autonomous Driving

Identifies new types of obstacles in road environments not covered by training data

Retail Analytics

Product Recognition

Identifies categories and attributes of newly launched products

Property	Details
Model Type	Qwen/Qwen2.5-VL-3B-Instruct
Training Data	omlab/OVDEval
Pipeline Tag	zero-shot-object-detection

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

VLM R1 Qwen2.5VL 3B OVD 0321

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen 2.5VL 3B Enhanced for OVD

🚀 Quick Start

📚 Documentation

📄 License

📖 Citation