DETR - ResNet - 50 - DC5 - Panoptic Open - source Model: A Practical Choice for End - to - end Object Detection and Panoptic Segmentation

Detr Resnet 50 Dc5 Panoptic

Developed by facebook

DETR is an end-to-end object detection model combining convolutional neural networks and Transformer architecture, supporting panoptic segmentation tasks.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #End-to-end object detection #Panoptic segmentation #Transformer architecture

Downloads 45

Release Time : 3/2/2022

Model Overview

This model adopts a Transformer encoder-decoder structure, achieving object detection through object query mechanisms, and can directly output class labels, bounding boxes, and segmentation masks.

Model Features

End-to-end training

Directly outputs detection results without complex hand-designed components (such as anchor boxes or non-maximum suppression).

Bipartite matching loss

Uses the Hungarian algorithm to establish optimal matching between predictions and annotations, combining cross-entropy and bounding box loss for optimization.

Multi-task support

Can be extended to a panoptic segmentation model by adding a mask head, simultaneously performing object detection and instance segmentation.

Model Capabilities

Object detection

Instance segmentation

Panoptic segmentation

Image analysis

Use Cases

Intelligent surveillance

Construction site safety monitoring

Detects workers, equipment, and hazardous areas on construction sites

Accurately identifies multiple targets in construction scenarios

Retail analytics

Product recognition

Automatically identifies and counts products on shelves

High detection accuracy for common products like apples and oranges

🚀 DETR (End-to-End Object Detection) model with ResNet-50 backbone (dilated C5 stage)

The DETR model is an encoder - decoder transformer for object detection and panoptic segmentation, trained on COCO 2017 dataset.

🚀 Quick Start

The DETR model can be used for panoptic segmentation. You can utilize the raw model directly. Check the model hub to find all available DETR models.

✨ Features

Encoder - Decoder Transformer: It is an encoder - decoder transformer with a convolutional backbone.
Object Queries: Uses object queries to detect objects in an image. For COCO, the number of object queries is set to 100.
Bipartite Matching Loss: Trained using a "bipartite matching loss" with the Hungarian matching algorithm.
Panoptic Segmentation: Can be extended to perform panoptic segmentation by adding a mask head.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import DetrFeatureExtractor, DetrForSegmentation
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = DetrFeatureExtractor.from_pretrained('facebook/detr-resnet-50-dc5-panoptic')
model = DetrForSegmentation.from_pretrained('facebook/detr-resnet-50-dc5-panoptic')

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
# model predicts COCO classes, bounding boxes, and masks
logits = outputs.logits
bboxes = outputs.pred_boxes
masks = outputs.pred_masks

📚 Documentation

Model Description

The DETR model is an encoder - decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs for object detection: a linear layer for class labels and a MLP for bounding boxes. The model uses object queries to detect objects. For COCO, there are 100 object queries.

It is trained using a "bipartite matching loss". The Hungarian matching algorithm creates an optimal one - to - one mapping between queries and annotations. Standard cross - entropy and a linear combination of L1 and generalized IoU loss are used to optimize the model parameters. DETR can be extended for panoptic segmentation by adding a mask head.

Intended Uses & Limitations

You can use the raw model for panoptic segmentation. See the model hub for all available DETR models.

Training Data

The DETR model was trained on COCO 2017 panoptic, which has 118k/5k annotated images for training/validation respectively.

Training Procedure

Preprocessing

The exact preprocessing details can be found here. Images are resized/rescaled so that the shortest side is at least 800 pixels and the largest side at most 1333 pixels, and normalized across RGB channels with ImageNet mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225).

Training

The model was trained for 300 epochs on 16 V100 GPUs, taking 3 days with 4 images per GPU (total batch size of 64).

Evaluation Results

This model achieves a box AP (average precision) of 40.2, a segmentation AP (average precision) of 31.9 and a PQ (panoptic quality) of 44.6 on COCO 2017 validation. For more details, refer to table 5 of the original paper.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2005-12872,
  author    = {Nicolas Carion and
               Francisco Massa and
               Gabriel Synnaeve and
               Nicolas Usunier and
               Alexander Kirillov and
               Sergey Zagoruyko},
  title     = {End-to-End Object Detection with Transformers},
  journal   = {CoRR},
  volume    = {abs/2005.12872},
  year      = {2020},
  url       = {https://arxiv.org/abs/2005.12872},
  archivePrefix = {arXiv},
  eprint    = {2005.12872},
  timestamp = {Thu, 28 May 2020 17:38:09 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2005-12872.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

The model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	DETR (End - to - End Object Detection) model with ResNet - 50 backbone (dilated C5 stage)
Training Data	COCO 2017 panoptic (118k annotated images for training, 5k for validation)
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご