Yolos-tiny Open-source Object Detection Model - Efficiently Identify Objects and Free Deployment, Super Practical!

Yolos Tiny

Developed by hustvl

YOLOS model fine-tuned on the COCO 2017 object detection dataset, utilizing Vision Transformer architecture for efficient object detection.

Object Detection

Transformers

Open Source License:Apache-2.0 #Object Detection #Vision Transformer #COCO Dataset

Downloads 144.58k

Release Time : 4/26/2022

Model Overview

YOLOS is a Vision Transformer (ViT) trained with DETR loss, specifically designed for object detection tasks, featuring a simple yet high-performance structure.

Model Features

Simple Transformer Architecture

Adopts Vision Transformer architecture, enabling efficient object detection without complex designs.

Bipartite Matching Loss

Uses the Hungarian matching algorithm to establish optimal one-to-one mapping between queries and annotations, optimizing model parameters.

High Performance

The base-size model achieves 42 AP on the COCO validation set, comparable to complex frameworks like DETR and Faster R-CNN.

Model Capabilities

Object Detection

Image Analysis

Object Recognition

Use Cases

Visual Inspection

Scene Object Detection

Detects various objects in images, such as savannah animals or football players in a match.

Accurately identifies and locates multiple objects in the image.

Industrial Quality Inspection

Detects product defects or anomalies on production lines.

🚀 YOLOS (tiny-sized) model

The YOLOS model is fine - tuned on COCO 2017 object detection, offering effective object detection capabilities.

🚀 Quick Start

The YOLOS model was fine - tuned on COCO 2017 object detection, which contains 118k annotated images. It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.

Disclaimer: The team releasing YOLOS did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

YOLOS is a Vision Transformer (ViT) trained using the DETR loss. A base - sized YOLOS model can achieve 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R - CNN.
The model uses a "bipartite matching loss". It compares the predicted classes and bounding boxes of each of the N = 100 object queries to the ground truth annotations. The Hungarian matching algorithm is used for optimal one - to - one mapping, and standard cross - entropy and a linear combination of the L1 and generalized IoU loss are used to optimize the model parameters.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image
import torch
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")

inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# model predicts bounding boxes and corresponding COCO classes
logits = outputs.logits
bboxes = outputs.pred_boxes


# print results
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
        f"Detected {model.config.id2label[label.item()]} with confidence "
        f"{round(score.item(), 3)} at location {box}"
    )

Currently, both the feature extractor and model support PyTorch.

📚 Documentation

Intended uses & limitations

You can use the raw model for object detection. See the model hub to look for all available YOLOS models.

🔧 Technical Details

The model is trained using a "bipartite matching loss". One compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N. The Hungarian matching algorithm is used to create an optimal one - to - one mapping between each of the N queries and each of the N annotations. Next, standard cross - entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.

📄 License

This model is released under the Apache 2.0 license.

📊 Information Table

Property	Details
Model Type	YOLOS (tiny - sized)
Training Data	Pre - trained on ImageNet - 1k, fine - tuned on COCO 2017 object detection

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2106-00666,
  author    = {Yuxin Fang and
               Bencheng Liao and
               Xinggang Wang and
               Jiemin Fang and
               Jiyang Qi and
               Rui Wu and
               Jianwei Niu and
               Wenyu Liu},
  title     = {You Only Look at One Sequence: Rethinking Transformer in Vision through
               Object Detection},
  journal   = {CoRR},
  volume    = {abs/2106.00666},
  year      = {2021},
  url       = {https://arxiv.org/abs/2106.00666},
  eprinttype = {arXiv},
  eprint    = {2106.00666},
  timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご