Yolos-base Open-source Object Detection Model - Free Deployment, Precise Object Recognition, Outstanding Performance

Yolos Base

Developed by hustvl

YOLOS is a vision Transformer (ViT)-based object detection model trained with DETR loss, achieving 42 AP performance on the COCO dataset.

Object Detection

Transformers

Open Source License:Apache-2.0 #Transformer object detection #COCO dataset fine-tuning #Bipartite matching loss

Downloads 2,638

Release Time : 4/26/2022

Model Overview

YOLOS is a vision Transformer (ViT) trained with DETR loss, specifically designed for object detection tasks. The model performs excellently on the COCO 2017 validation set, matching the performance of complex frameworks like DETR and Faster R-CNN.

Model Features

Transformer-based object detection

YOLOS adopts the vision Transformer architecture, transforming the object detection task into a sequence prediction problem, simplifying the complexity of traditional detection frameworks.

Bipartite matching loss

Uses the Hungarian matching algorithm to establish optimal correspondences between predictions and annotations, combining cross-entropy loss with L1 and generalized IoU loss to optimize model parameters.

High performance

Achieves 42 AP on the COCO 2017 validation set, matching the performance of DETR and more complex frameworks like Faster R-CNN.

Model Capabilities

Object detection

Image analysis

Bounding box prediction

Use Cases

Computer vision

Scene understanding

Detects objects and their positions in images, suitable for scenarios like surveillance and autonomous driving.

Accurately identifies and locates multiple objects in images.

Image annotation

Automatically generates annotations for images, including object categories and positions.

Provides high-quality image annotations, reducing manual labeling costs.

🚀 YOLOS (base-sized) model

A fine-tuned YOLOS model on COCO 2017 object detection for accurate object identification in images.

🚀 Quick Start

The YOLOS model is fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.

Disclaimer: The team releasing YOLOS did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

High Performance: A base-sized YOLOS model can achieve 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN.
Simple Training Loss: Trained using a "bipartite matching loss" with the Hungarian matching algorithm for optimal mapping.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import YolosFeatureExtractor, YolosForObjectDetection
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = YolosFeatureExtractor.from_pretrained('hustvl/yolos-base')
model = YolosForObjectDetection.from_pretrained('hustvl/yolos-base')

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)

# model predicts bounding boxes and corresponding COCO classes
logits = outputs.logits
bboxes = outputs.pred_boxes

Note: Currently, both the feature extractor and model support PyTorch.

📚 Documentation

Model description

YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN).

The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.

Intended uses & limitations

You can use the raw model for object detection. See the model hub to look for all available YOLOS models.

🔧 Technical Details

Training data

The YOLOS model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively.

Training

The model was pre-trained for 1000 epochs on ImageNet-1k and fine-tuned for 150 epochs on COCO.

Evaluation results

This model achieves an AP (average precision) of 42.0 on COCO 2017 validation. For more details regarding evaluation results, we refer to the original paper.

📄 License

This model is released under the Apache-2.0 license.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2106-00666,
  author    = {Yuxin Fang and
               Bencheng Liao and
               Xinggang Wang and
               Jiemin Fang and
               Jiyang Qi and
               Rui Wu and
               Jianwei Niu and
               Wenyu Liu},
  title     = {You Only Look at One Sequence: Rethinking Transformer in Vision through
               Object Detection},
  journal   = {CoRR},
  volume    = {abs/2106.00666},
  year      = {2021},
  url       = {https://arxiv.org/abs/2106.00666},
  eprinttype = {arXiv},
  eprint    = {2106.00666},
  timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Table

Property	Details
Model Type	YOLOS (base-sized)
Training Data	Pre-trained on ImageNet-1k, fine-tuned on COCO 2017 object detection
License	Apache-2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご