dfine - nano - coco open-source real-time object detection model, with high-precision positioning and accurate target recognition!

Dfine Nano Coco

Developed by ustc-community

D-FINE is a powerful real-time object detection model that achieves excellent positioning accuracy by redefining the bounding box regression task in the DETR model.

Object Detection

Transformers

EnglishOpen Source License:Apache-2.0 #Real-time object detection #High-precision positioning #Autonomous driving optimization

Downloads 3,146

Release Time : 3/28/2025

Model Overview

D-FINE is an object detection model based on the DETR architecture. It improves positioning accuracy through Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-distillation (GO-LSD) techniques, and is suitable for various real-time detection scenarios.

Model Features

Fine-grained Distribution Refinement (FDR)

Redefine the bounding box regression task to achieve more accurate object positioning

Global Optimal Localization Self-distillation (GO-LSD)

Improve the model's positioning performance through self-distillation technology

Real-time detection capability

The optimized architecture supports real-time object detection

Multi-dataset training

Support training on mainstream datasets such as COCO and Object365

Model Capabilities

Real-time object detection

High-precision positioning

Multi-category recognition

Use Cases

Autonomous driving

Road object detection

Detect vehicles, pedestrians, traffic signs, etc. in real-time

High-precision positioning of key road objects

Intelligent monitoring

Security monitoring

Detect abnormal behaviors or targets in the monitoring screen in real-time

Improve the response speed of the monitoring system

Retail analysis

Shelf product detection

Automatically identify the placement of products on the shelf

Optimize inventory management

Robotics

Environmental perception

Help robots identify surrounding objects

Improve the robot's navigation ability

🚀 D-FINE

D-FINE is a powerful real - time object detector. It redefines the bounding box regression task in DETR models to achieve outstanding localization precision. This README provides an overview, performance details, usage examples, training information, and application scenarios of the D - FINE model.

✨ Features

Innovative Design: Redefines the regression task in DETRs as fine - grained distribution refinement.
High Precision: Achieves outstanding localization precision.
Two Key Components: Comprises Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD).
Versatile Applications: Ideal for real - time object detection in various fields like autonomous driving, surveillance systems, robotics, and retail analytics.

📦 Installation

The provided README does not contain installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
import requests

from PIL import Image
from transformers import DFineForObjectDetection, AutoImageProcessor

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("ustc-community/dfine-nano-coco")
model = DFineForObjectDetection.from_pretrained("ustc-community/dfine-nano-coco")

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)

for result in results:
    for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
        score, label = score.item(), label_id.item()
        box = [round(i, 2) for i in box.tolist()]
        print(f"{model.config.id2label[label]}: {score:.2f} {box}")

📚 Documentation

Overview

The D - FINE model was proposed in D - FINE: Redefine Regression Task in DETRs as Fine - grained Distribution Refinement by Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu. This model was contributed by [VladOS95 - cyber](https://github.com/VladOS95 - cyber) with the help of [@qubvel - hf](https://huggingface.co/qubvel - hf). This is the HF transformers implementation for D - FINE.

_coco -> model trained on COCO
_obj365 -> model trained on Object365
_obj2coco -> model trained on Object365 and then finetuned on COCO

Performance

D - FINE is a powerful real - time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D - FINE comprises two key components: Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD).

Training

D - FINE is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 - 0.95 with a step size of 0.05), and APval5000 commonly used in real scenarios.

Applications

D - FINE is ideal for real - time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment - friendly design make it suitable for both edge devices and large - scale systems + ensures high accuracy and speed in dynamic, real - world environments.

🔧 Technical Details

The D - FINE model redefines the bounding box regression task in DETR models. It consists of two main components: Fine - grained Distribution Refinement (FDR) and Global Optimal Localization Self - Distillation (GO - LSD). These components work together to achieve high - precision object localization. The model is trained on datasets like COCO and Object365, and can be fine - tuned on different datasets to adapt to various scenarios.

📄 License

The license for this project is Apache - 2.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご