đ D-FINE
D-FINE is a powerful real - time object detector. It redefines the bounding box regression task in DETR models to achieve outstanding localization precision. This README provides an overview, performance details, usage examples, training information, and application scenarios of the D - FINE model.
⨠Features
- Innovative Design: Redefines the regression task in DETRs as fine - grained distribution refinement.
- High Precision: Achieves outstanding localization precision.
- Two Key Components: Comprises Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD).
- Versatile Applications: Ideal for real - time object detection in various fields like autonomous driving, surveillance systems, robotics, and retail analytics.
đĻ Installation
The provided README does not contain installation steps, so this section is skipped.
đģ Usage Examples
Basic Usage
import torch
import requests
from PIL import Image
from transformers import DFineForObjectDetection, AutoImageProcessor
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("ustc-community/dfine-nano-coco")
model = DFineForObjectDetection.from_pretrained("ustc-community/dfine-nano-coco")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
đ Documentation
Overview
The D - FINE model was proposed in D - FINE: Redefine Regression Task in DETRs as Fine - grained Distribution Refinement by Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu. This model was contributed by [VladOS95 - cyber](https://github.com/VladOS95 - cyber) with the help of [@qubvel - hf](https://huggingface.co/qubvel - hf). This is the HF transformers implementation for D - FINE.
- _coco -> model trained on COCO
- _obj365 -> model trained on Object365
- _obj2coco -> model trained on Object365 and then finetuned on COCO
Performance
D - FINE is a powerful real - time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D - FINE comprises two key components: Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD).

Training
D - FINE is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 - 0.95 with a step size of 0.05), and APval5000 commonly used in real scenarios.
Applications
D - FINE is ideal for real - time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment - friendly design make it suitable for both edge devices and large - scale systems + ensures high accuracy and speed in dynamic, real - world environments.
đ§ Technical Details
The D - FINE model redefines the bounding box regression task in DETR models. It consists of two main components: Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD). These components work together to achieve high - precision object localization. The model is trained on datasets like COCO and Object365, and can be fine - tuned on different datasets to adapt to various scenarios.
đ License
The license for this project is Apache - 2.0.