Detr-ResNet-101-DC5-SKU110K Open-Source Object Detection Model - Precise Recognition and Efficiency Improvement

Detr Resnet 101 Dc5 Sku110k

Developed by isalia99

This is an object detection model based on the DETR architecture, using ResNet-101-DC5 as the backbone network and trained on the SKU110K dataset with the number of queries set to 400.

Object Detection

Transformers

Open Source License:Apache-2.0 #Retail Shelf Detection #400 Query Optimization #DETR Architecture

Downloads 129

Release Time : 3/18/2024

Model Overview

This model is specifically designed for object detection tasks, particularly suitable for retail product detection scenarios.

Model Features

400 Query Design

Compared to the original DETR model, this model sets the number of queries to 400, potentially improving the detection capability for dense small objects.

SKU110K Dataset Pre-Training

Optimized specifically for retail product detection scenarios, trained end-to-end on the SKU110K dataset.

End-to-End Training

Adopts DETR's end-to-end training approach, eliminating the need for complex post-processing pipelines.

Model Capabilities

Object Detection

Retail Product Recognition

Dense Small Object Detection

Use Cases

Retail Industry

Shelf Product Detection

Automatically detects and identifies products on retail shelves

Achieves mAP of 59.8 on the SKU110K validation set

Inventory Management

Assists retail stores in automated inventory counting

🚀 DETR (End-to-End Object Detection) model with ResNet-101-DC5 backbone trained on SKU110K Dataset with 400 num_queries

This is an end-to-end trained DEtection TRansformer (DETR) model on the SKU110K object detection dataset (8k annotated images). The main difference from the original model is that it has 400 num_queries and is pre - trained on the SKU110K dataset.

🚀 Quick Start

✨ Features

Trained end - to - end on the SKU110K object detection dataset.
Different from the original model with 400 num_queries and pre - trained on SKU110K.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image, ImageOps
import requests

url = "https://github.com/Isalia20/DETR-finetune/blob/main/IMG_3507.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
image = ImageOps.exif_transpose(image)

# you can specify the revision tag if you don't want the timm dependency
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-101-dc5")
model = DetrForObjectDetection.from_pretrained("isalia99/detr-resnet-101-dc5-sku110k")
model = model.eval()
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.8
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.8)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(
                f"Detected {model.config.id2label[label.item()]} with confidence "
                f"{round(score.item(), 3)} at location {box}"
        )

This should output:

Detected LABEL_1 with confidence 0.983 at location [665.49, 480.05, 708.15, 650.11]
Detected LABEL_1 with confidence 0.938 at location [204.99, 1405.9, 239.9, 1546.5]
...
Detected LABEL_1 with confidence 0.998 at location [772.85, 169.49, 829.67, 372.18]
Detected LABEL_1 with confidence 0.999 at location [828.28, 1475.16, 874.37, 1593.43]

Currently, both the feature extractor and model support PyTorch.

📚 Documentation

Training data

Property	Details
Model Type	DETR model with ResNet - 101 - DC5 backbone
Training Data	The DETR model was trained on SKU110K Dataset, a dataset consisting of 8,219/588/2,936 annotated images for training/validation/test respectively.

Training procedure

Training

The model was trained for 60 epochs on 1 RTX 4060 Ti GPU(Finetuning decoder only) with batch size of 1 and gradient_accumulation set to 8 and 60 epochs(finetuning the whole network) with batch size of 1 and accumulating gradients for 8 steps.

Evaluation results

This model achieves an mAP of 59.8 on SKU110k validation set. Result was calculated with torchmetrics MeanAveragePrecision class.

Training Code

Code is released in this repository Repo Link. However it's not finalized/tested well yet but the main stuff is in the code.

📄 License

This project is licensed under the Apache-2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご