đ RT-DETRv2
The RT-DETRv2 model refines real-time object detection, enhancing flexibility and practicality while maintaining high performance.
đ Quick Start
The RT-DETRv2 model was proposed in RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer by Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu. RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies like dynamic data augmentation and scale-adaptive hyperparameters.
These changes enhance flexibility and practicality while maintaining real-time performance.
This model was contributed by @jadechoghari with the help of @cyrilvallez and @qubvel-hf
⨠Features
Performance
RT-DETRv2 consistently outperforms its predecessor across all model sizes while maintaining the same real-time speeds.

Applications
RT-DETRv2 is ideal for real-time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments.
đģ Usage Examples
Basic Usage
import torch
import requests
from PIL import Image
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r101vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r101vd")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
Advanced Usage
import torch
import requests
from PIL import Image
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r101vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r101vd")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
đ§ Technical Details
RT-DETRv2 is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 â 0.95 with a step size of 0.05), and APval50 commonly used in real scenarios.
đ License
This project is under the Apache-2.0 license.