yolos-small-balloon Open-source Object Detection Model - Precise Identification of Target Objects like Balloons

Yolos Small Balloon

Developed by zoheb

YOLOS is an object detection model using Vision Transformer (ViT) architecture, trained with DETR loss and fine-tuned on COCO and Matterport Balloon datasets.

Object Detection

Transformers

#Small object detection #ViT architecture #Bipartite matching loss

Downloads 101

Release Time : 10/16/2022

Model Overview

This model employs Transformer architecture for object detection, trained with bipartite matching loss, supports PyTorch framework, and is suitable for small-scale object detection tasks.

Model Features

Transformer architecture

Adopts Vision Transformer architecture, processing images as sequential data for object detection.

Bipartite matching loss

Uses Hungarian matching algorithm to establish optimal mapping between predictions and annotations, optimizing the model through cross-entropy and bounding box loss.

Small-scale dataset fine-tuning

Successfully fine-tuned on the Matterport Balloon dataset (only 74 images), demonstrating adaptability to small datasets.

Model Capabilities

Object detection

Bounding box prediction

Small-scale data adaptation

Use Cases

Computer vision

Balloon detection

Detects balloon objects in images and marks their locations

Achieved 26.9 AP on Matterport Balloon validation set

🚀 🤖 YOLOS (Small-sized) Model Fine-tuned on Matterport Balloon Dataset

YOLOS is a Vision Transformer (ViT) trained with the DETR loss, offering strong object detection capabilities.

YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN). YOLOS model fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.

📚 Documentation

Model description

The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.

Currently, both the feature extractor and model support PyTorch.

Training data

This model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. It was further fine-tuned on Matterport Balloon Detection dataset, a dataset containg 74 annotated images.

Training

The model was pre-trained for 200 epochs on ImageNet-1k, fine-tuned for 150 epochs on COCO and further fine-tuned for 96 epochs on Matterport Balloon Dataset.

You can go through its detailed notebook here.

Evaluation results

This model achieves an AP (average precision) of 26.9 on Matterport Balloon validation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2106-00666,
  author    = {Yuxin Fang and
               Bencheng Liao and
               Xinggang Wang and
               Jiemin Fang and
               Jiyang Qi and
               Rui Wu and
               Jianwei Niu and
               Wenyu Liu},
  title     = {You Only Look at One Sequence: Rethinking Transformer in Vision through
               Object Detection},
  journal   = {CoRR},
  volume    = {abs/2106.00666},
  year      = {2021},
  url       = {https://arxiv.org/abs/2106.00666},
  eprinttype = {arXiv},
  eprint    = {2106.00666},
  timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 Information Table

Property	Details
Model Type	Vision Transformer (ViT) fine-tuned on Matterport balloon dataset
Training Data	Pre-trained on ImageNet-1k, fine-tuned on COCO 2017 object detection and Matterport Balloon Detection dataset
Fine-tuned from	hustvl/yolos-small
Tags	object-detection, vision

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご