đ DETR (End-to-End Object Detection) model with ResNet-50 backbone (dilated C5 stage)
The DETR model is an encoder - decoder transformer for object detection and panoptic segmentation, trained on COCO 2017 dataset.
đ Quick Start
The DETR model can be used for panoptic segmentation. You can utilize the raw model directly. Check the model hub to find all available DETR models.
⨠Features
- Encoder - Decoder Transformer: It is an encoder - decoder transformer with a convolutional backbone.
- Object Queries: Uses object queries to detect objects in an image. For COCO, the number of object queries is set to 100.
- Bipartite Matching Loss: Trained using a "bipartite matching loss" with the Hungarian matching algorithm.
- Panoptic Segmentation: Can be extended to perform panoptic segmentation by adding a mask head.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import DetrFeatureExtractor, DetrForSegmentation
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = DetrFeatureExtractor.from_pretrained('facebook/detr-resnet-50-dc5-panoptic')
model = DetrForSegmentation.from_pretrained('facebook/detr-resnet-50-dc5-panoptic')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
bboxes = outputs.pred_boxes
masks = outputs.pred_masks
đ Documentation
Model Description
The DETR model is an encoder - decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs for object detection: a linear layer for class labels and a MLP for bounding boxes. The model uses object queries to detect objects. For COCO, there are 100 object queries.
It is trained using a "bipartite matching loss". The Hungarian matching algorithm creates an optimal one - to - one mapping between queries and annotations. Standard cross - entropy and a linear combination of L1 and generalized IoU loss are used to optimize the model parameters. DETR can be extended for panoptic segmentation by adding a mask head.
Intended Uses & Limitations
You can use the raw model for panoptic segmentation. See the model hub for all available DETR models.
Training Data
The DETR model was trained on COCO 2017 panoptic, which has 118k/5k annotated images for training/validation respectively.
Training Procedure
Preprocessing
The exact preprocessing details can be found here. Images are resized/rescaled so that the shortest side is at least 800 pixels and the largest side at most 1333 pixels, and normalized across RGB channels with ImageNet mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225).
Training
The model was trained for 300 epochs on 16 V100 GPUs, taking 3 days with 4 images per GPU (total batch size of 64).
Evaluation Results
This model achieves a box AP (average precision) of 40.2, a segmentation AP (average precision) of 31.9 and a PQ (panoptic quality) of 44.6 on COCO 2017 validation. For more details, refer to table 5 of the original paper.
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2005-12872,
author = {Nicolas Carion and
Francisco Massa and
Gabriel Synnaeve and
Nicolas Usunier and
Alexander Kirillov and
Sergey Zagoruyko},
title = {End-to-End Object Detection with Transformers},
journal = {CoRR},
volume = {abs/2005.12872},
year = {2020},
url = {https://arxiv.org/abs/2005.12872},
archivePrefix = {arXiv},
eprint = {2005.12872},
timestamp = {Thu, 28 May 2020 17:38:09 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2005-12872.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
đ License
The model is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
DETR (End - to - End Object Detection) model with ResNet - 50 backbone (dilated C5 stage) |
Training Data |
COCO 2017 panoptic (118k annotated images for training, 5k for validation) |
License |
Apache - 2.0 |