🚀 D-FINE
D-FINE is a powerful real-time object detector. It redefines the bounding box regression task in DETR models to achieve outstanding localization precision. This README provides an overview of the D-FINE model, its performance, usage examples, training details, and applications.
✨ Features
- High Precision: Redefines the bounding box regression task in DETR models to achieve outstanding localization precision.
- Two Key Components: Comprises Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).
- Trained on Multiple Datasets: Can be trained on COCO, Object365, or a combination of both.
- Versatile Applications: Ideal for real-time object detection in autonomous driving, surveillance systems, robotics, and retail analytics.
📦 Installation
The installation process is not provided in the original README. If you need to install the necessary libraries, you can use pip
to install transformers
and other dependencies:
pip install transformers torch requests pillow
💻 Usage Examples
Basic Usage
import torch
import requests
from PIL import Image
from transformers import DFineForObjectDetection, AutoImageProcessor
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("ustc-community/dfine-medium-obj2coco")
model = DFineForObjectDetection.from_pretrained("ustc-community/dfine-medium-obj2coco")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
📚 Documentation
Overview
The D-FINE model was proposed in D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement by Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu. This model was contributed by VladOS95-cyber with the help of @qubvel-hf. This is the HF transformers implementation for D-FINE.
- _coco -> model trained on COCO
- _obj365 -> model trained on Object365
- _obj2coco -> model trained on Object365 and then finetuned on COCO
Performance
D-FINE is a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).

Training
D-FINE is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 - 0.95 with a step size of 0.05), and APval5000 commonly used in real scenarios.
Applications
D-FINE is ideal for real-time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments.
📄 License
This project is licensed under the Apache-2.0 license.