đ YOLOS (tiny-sized) model
The YOLOS model is fine - tuned on COCO 2017 object detection, offering effective object detection capabilities.
đ Quick Start
The YOLOS model was fine - tuned on COCO 2017 object detection, which contains 118k annotated images. It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.
Disclaimer: The team releasing YOLOS did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
- YOLOS is a Vision Transformer (ViT) trained using the DETR loss. A base - sized YOLOS model can achieve 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R - CNN.
- The model uses a "bipartite matching loss". It compares the predicted classes and bounding boxes of each of the N = 100 object queries to the ground truth annotations. The Hungarian matching algorithm is used for optimal one - to - one mapping, and standard cross - entropy and a linear combination of the L1 and generalized IoU loss are used to optimize the model parameters.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image
import torch
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
bboxes = outputs.pred_boxes
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f"Detected {model.config.id2label[label.item()]} with confidence "
f"{round(score.item(), 3)} at location {box}"
)
Currently, both the feature extractor and model support PyTorch.
đ Documentation
Intended uses & limitations
You can use the raw model for object detection. See the model hub to look for all available YOLOS models.
đ§ Technical Details
The model is trained using a "bipartite matching loss". One compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N. The Hungarian matching algorithm is used to create an optimal one - to - one mapping between each of the N queries and each of the N annotations. Next, standard cross - entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.
đ License
This model is released under the Apache 2.0 license.
đ Information Table
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2106-00666,
author = {Yuxin Fang and
Bencheng Liao and
Xinggang Wang and
Jiemin Fang and
Jiyang Qi and
Rui Wu and
Jianwei Niu and
Wenyu Liu},
title = {You Only Look at One Sequence: Rethinking Transformer in Vision through
Object Detection},
journal = {CoRR},
volume = {abs/2106.00666},
year = {2021},
url = {https://arxiv.org/abs/2106.00666},
eprinttype = {arXiv},
eprint = {2106.00666},
timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}