dfine-medium-obj2coco Open Source Object Detection Model - Redefining the Regression Task for Precise Localization

Dfine Medium Obj2coco

Developed by ustc-community

D-FINE is a real-time object detection model that achieves exceptional localization accuracy by redefining the bounding box regression task.

Object Detection

Transformers

EnglishOpen Source License:Apache-2.0 #Real-time object detection #Fine-grained distribution optimization #Autonomous driving vision

Downloads 3,610

Release Time : 3/28/2025

Model Overview

D-FINE is a DETR-based object detection model that improves localization accuracy through fine-grained distribution optimization and global optimal localization self-distillation techniques. Suitable for real-time object detection scenarios such as autonomous driving and security surveillance.

Model Features

Fine-grained Distribution Refinement (FDR)

Redefines the bounding box regression task to enhance localization accuracy.

Global Optimal Localization Self-Distillation (GO-LSD)

Optimizes model performance through self-distillation techniques.

Real-time detection

Suitable for real-time object detection scenarios requiring high-speed response.

Model Capabilities

Object detection

Real-time image analysis

Multi-object recognition

Use Cases

Autonomous driving

Vehicle and pedestrian detection

Detects vehicles and pedestrians on the road in real-time to enhance autonomous driving safety.

Security surveillance

Abnormal behavior detection

Detects abnormal behavior or suspicious items in surveillance videos.

Retail analytics

Product recognition

Identifies products on shelves for inventory management and customer behavior analysis.

🚀 D-FINE

D-FINE is a powerful real-time object detector. It redefines the bounding box regression task in DETR models to achieve outstanding localization precision. This README provides an overview of the D-FINE model, its performance, usage examples, training details, and applications.

✨ Features

High Precision: Redefines the bounding box regression task in DETR models to achieve outstanding localization precision.
Two Key Components: Comprises Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).
Trained on Multiple Datasets: Can be trained on COCO, Object365, or a combination of both.
Versatile Applications: Ideal for real-time object detection in autonomous driving, surveillance systems, robotics, and retail analytics.

📦 Installation

The installation process is not provided in the original README. If you need to install the necessary libraries, you can use pip to install transformers and other dependencies:

pip install transformers torch requests pillow

💻 Usage Examples

Basic Usage

import torch
import requests

from PIL import Image
from transformers import DFineForObjectDetection, AutoImageProcessor

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("ustc-community/dfine-medium-obj2coco")
model = DFineForObjectDetection.from_pretrained("ustc-community/dfine-medium-obj2coco")

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)

for result in results:
    for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
        score, label = score.item(), label_id.item()
        box = [round(i, 2) for i in box.tolist()]
        print(f"{model.config.id2label[label]}: {score:.2f} {box}")

📚 Documentation

Overview

The D-FINE model was proposed in D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement by Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu. This model was contributed by VladOS95-cyber with the help of @qubvel-hf. This is the HF transformers implementation for D-FINE.

_coco -> model trained on COCO
_obj365 -> model trained on Object365
_obj2coco -> model trained on Object365 and then finetuned on COCO

Performance

D-FINE is a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).

Training

D-FINE is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 - 0.95 with a step size of 0.05), and APval5000 commonly used in real scenarios.

Applications

D-FINE is ideal for real-time object detection in diverse applications such as autonomous driving, surveillance systems, robotics, and retail analytics. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご