RT-DETRv2 (rtdetr_v2_r101vd) Open-source Object Detection Model - Optimized Performance for Precise Object Recognition

Rtdetr V2 R101vd

Developed by PekingU

RT-DETRv2 is an improved real-time object detection model based on the DETR architecture, optimizing detection performance through innovations like selective multi-scale feature extraction and dynamic data augmentation.

Object Detection

Transformers

EnglishOpen Source License:Apache-2.0 #Real-time object detection #Multi-scale feature extraction #Dynamic data augmentation

Downloads 1,892

Release Time : 1/31/2025

Model Overview

This model significantly enhances the flexibility and practicality of object detection while maintaining real-time performance through innovative technologies such as selective multi-scale feature extraction and more compatible discrete sampling operators.

Model Features

Selective multi-scale feature extraction

Optimizes the feature extraction process, improving detection capability for objects of different scales.

More compatible discrete sampling operator

Enhances sampling strategy, improving the model's adaptability to various scenarios.

Dynamic data augmentation

Employs dynamic data augmentation strategies to improve the model's generalization ability.

Real-time performance optimization

Significantly improves detection accuracy while maintaining real-time detection speed.

Model Capabilities

Real-time object detection

Multi-scale object recognition

Complex scene analysis

Use Cases

Autonomous driving

Road object detection

Real-time detection of vehicles, pedestrians, and other objects on the road.

High-precision identification of various traffic participants.

Security surveillance

Abnormal behavior recognition

Real-time monitoring of suspicious activities in video streams.

Quick identification of potential security threats.

Retail analytics

Shelf product detection

Automatic identification of product distribution on shelves.

Optimizes inventory management and product placement.

🚀 RT-DETRv2

The RT-DETRv2 model refines real-time object detection, enhancing flexibility and practicality while maintaining high performance.

🚀 Quick Start

The RT-DETRv2 model was proposed in RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer by Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu. RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies like dynamic data augmentation and scale-adaptive hyperparameters. These changes enhance flexibility and practicality while maintaining real-time performance.

This model was contributed by @jadechoghari with the help of @cyrilvallez and @qubvel-hf

✨ Features

Performance

RT-DETRv2 consistently outperforms its predecessor across all model sizes while maintaining the same real-time speeds.

Applications

RT-DETRv2 is ideal for real-time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments.

💻 Usage Examples

Basic Usage

import torch
import requests

from PIL import Image
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r101vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r101vd")

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
     outputs = model(**inputs)

results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)

for result in results:
     for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
         score, label = score.item(), label_id.item()
         box = [round(i, 2) for i in box.tolist()]
         print(f"{model.config.id2label[label]}: {score:.2f} {box}")

Advanced Usage

# The above code provides a basic example of using the RT - DETRv2 model for object detection. 
# For more complex scenarios, you can adjust parameters such as the threshold in post - processing, 
# or use different pre - trained models according to specific needs.

import torch
import requests

from PIL import Image
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r101vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r101vd")

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
     outputs = model(**inputs)

results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)

for result in results:
     for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
         score, label = score.item(), label_id.item()
         box = [round(i, 2) for i in box.tolist()]
         print(f"{model.config.id2label[label]}: {score:.2f} {box}")

🔧 Technical Details

RT-DETRv2 is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 − 0.95 with a step size of 0.05), and APval50 commonly used in real scenarios.

📄 License

This project is under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご