đ DETR (End-to-End Object Detection) model with ResNet-101-DC5 backbone trained on SKU110K Dataset with 400 num_queries
This is an end-to-end trained DEtection TRansformer (DETR) model on the SKU110K object detection dataset (8k annotated images). The main difference from the original model is that it has 400 num_queries and is pre - trained on the SKU110K dataset.
đ Quick Start
⨠Features
- Trained end - to - end on the SKU110K object detection dataset.
- Different from the original model with 400 num_queries and pre - trained on SKU110K.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image, ImageOps
import requests
url = "https://github.com/Isalia20/DETR-finetune/blob/main/IMG_3507.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
image = ImageOps.exif_transpose(image)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-101-dc5")
model = DetrForObjectDetection.from_pretrained("isalia99/detr-resnet-101-dc5-sku110k")
model = model.eval()
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.8)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f"Detected {model.config.id2label[label.item()]} with confidence "
f"{round(score.item(), 3)} at location {box}"
)
This should output:
Detected LABEL_1 with confidence 0.983 at location [665.49, 480.05, 708.15, 650.11]
Detected LABEL_1 with confidence 0.938 at location [204.99, 1405.9, 239.9, 1546.5]
...
Detected LABEL_1 with confidence 0.998 at location [772.85, 169.49, 829.67, 372.18]
Detected LABEL_1 with confidence 0.999 at location [828.28, 1475.16, 874.37, 1593.43]
Currently, both the feature extractor and model support PyTorch.
đ Documentation
Training data
Property |
Details |
Model Type |
DETR model with ResNet - 101 - DC5 backbone |
Training Data |
The DETR model was trained on SKU110K Dataset, a dataset consisting of 8,219/588/2,936 annotated images for training/validation/test respectively. |
Training procedure
Training
The model was trained for 60 epochs on 1 RTX 4060 Ti GPU(Finetuning decoder only) with batch size of 1 and gradient_accumulation set to 8 and 60 epochs(finetuning the whole network) with batch size of 1 and accumulating gradients for 8 steps.
Evaluation results
This model achieves an mAP of 59.8 on SKU110k validation set. Result was calculated with torchmetrics MeanAveragePrecision class.
Training Code
Code is released in this repository Repo Link. However it's not finalized/tested well yet but the main stuff is in the code.
đ License
This project is licensed under the Apache-2.0 License.