yolos-small-finetuned-masks Open-source Model - Free Deployment for Precise Mask Detection

Yolos Small Finetuned Masks

Developed by nickmuchi

A small-scale Vision Transformer model based on YOLOS architecture, fine-tuned specifically for mask detection tasks, trained on COCO and mask detection datasets

Object Detection

Transformers

Open Source License:Apache-2.0 #Mask Detection #ViT Architecture #Small Object Detection

Downloads 153

Release Time : 6/17/2022

Model Overview

This model is an object detection model based on Vision Transformer (ViT), pre-trained on COCO dataset and specifically fine-tuned for mask detection tasks, capable of identifying three states: 'wearing mask', 'not wearing mask', and 'improperly wearing mask'

Model Features

Efficient Vision Transformer Architecture

Adopts a simple architecture based on ViT, trained with DETR loss function, achieving good detection accuracy while maintaining structural simplicity

Specialized Mask Detection Optimization

Fine-tuned for 200 epochs on a mask dataset with 853 annotated images, optimizing mask-related detection capabilities

Multi-scenario Adaptation

Evaluation results show good detection performance across various object scales (small/medium/large)

Model Capabilities

Image Object Detection

Mask Wearing Status Recognition

Crowd Scene Analysis

Use Cases

Public Health Monitoring

Public Place Mask Wearing Monitoring

Real-time monitoring of mask wearing status in public places like malls and stations

Achieves 53.2% average precision (AP@0.5)

Smart Security

Access Control System

Integrated into access control systems to automatically detect mask wearing status

🚀 YOLOS (small-sized) model

The YOLOS (small-sized) model is fine-tuned for object detection and face mask detection, offering high performance and practical applications.

🚀 Quick Start

The original YOLOS model was fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.

This model was further fine-tuned on the face mask dataset from Kaggle. The dataset consists of 853 images of people with annotations categorised as "with mask","without mask" and "mask not worn correctly". The model was trained for 200 epochs on a single GPU using Google Colab.

✨ Features

Object Detection: Can be used for general object detection tasks.
Face Mask Detection: Specifically fine-tuned for detecting face masks with different states.
Based on Vision Transformer: Utilizes the DETR loss for training, achieving good performance on COCO validation 2017.

📚 Documentation

Model description

YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN).

Intended uses & limitations

You can use the raw model for object detection. See the model hub to look for all available YOLOS models.

How to use

Here is how to use this model:

from transformers import YolosFeatureExtractor, YolosForObjectDetection
from PIL import Image
import requests
url = 'https://drive.google.com/uc?id=1VwYLbGak5c-2P5qdvfWVOeg7DTDYPbro'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = YolosFeatureExtractor.from_pretrained('nickmuchi/yolos-small-finetuned-masks')
model = YolosForObjectDetection.from_pretrained('nickmuchi/yolos-small-finetuned-masks')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
# model predicts bounding boxes and corresponding face mask detection classes
logits = outputs.logits
bboxes = outputs.pred_boxes

Currently, both the feature extractor and model support PyTorch.

Training data

The YOLOS model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively.

Training

This model was fine-tuned for 200 epochs on the face-mask-dataset.

Evaluation results

This model achieves an AP (average precision) of 53.2.

Accumulating evaluation results...

Metrics	Metric Parameter	Location	Dets	Value
Average Precision	(AP) @[ IoU=0.50:0.95	area= all	maxDets=100 ]	0.273
Average Precision	(AP) @[ IoU=0.50	area= all	maxDets=100 ]	0.532
Average Precision	(AP) @[ IoU=0.75	area= all	maxDets=100 ]	0.257
Average Precision	(AP) @[ IoU=0.50:0.95	area= small	maxDets=100 ]	0.220
Average Precision	(AP) @[ IoU=0.50:0.95	area=medium	maxDets=100 ]	0.341
Average Precision	(AP) @[ IoU=0.50:0.95	area= large	maxDets=100 ]	0.545
Average Recall	(AR) @[ IoU=0.50:0.95	area= all	maxDets= 1 ]	0.154
Average Recall	(AR) @[ IoU=0.50:0.95	area= all	maxDets= 10 ]	0.361
Average Recall	(AR) @[ IoU=0.50:0.95	area= all	maxDets=100 ]	0.415
Average Recall	(AR) @[ IoU=0.50:0.95	area= small	maxDets=100 ]	0.349
Average Recall	(AR) @[ IoU=0.50:0.95	area=medium	maxDets=100 ]	0.469
Average Recall	(AR) @[ IoU=0.50:0.95	area= large	maxDets=100 ]	0.584

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご