VLM R1 Qwen2.5VL 3B OVD 0321
V
VLM R1 Qwen2.5VL 3B OVD 0321
Developed by omlab
A zero-shot object detection model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, supporting open vocabulary detection tasks.
Downloads 892
Release Time : 3/21/2025
Model Overview
This model combines vision-language models with reinforcement learning techniques, specifically designed for Open Vocabulary Detection (OVD), capable of recognizing new category objects not explicitly labeled in the training data.
Model Features
Reinforcement Learning Enhancement
Optimizes model performance using the VLM-R1 reinforcement learning algorithm
Open Vocabulary Detection
Supports recognizing new category objects not included in the training data
Multimodal Understanding
Combines visual and linguistic information for object detection
Model Capabilities
Zero-shot Object Detection
Open Vocabulary Recognition
Multimodal Understanding
Vision-Language Reasoning
Use Cases
Computer Vision
Smart Surveillance
Detects unknown category objects in surveillance footage
Autonomous Driving
Identifies new types of obstacles in road environments not covered by training data
Retail Analytics
Product Recognition
Identifies categories and attributes of newly launched products
Featured Recommended AI Models
Š 2025AIbase