Open-source Mask2Former-swin-small-coco-instance model - Efficiently complete image instance segmentation tasks

Mask2former Swin Small Coco Instance

Developed by facebook

Mask2Former is a unified image segmentation model based on Transformer, fine-tuned on the COCO dataset for instance segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Instance Segmentation Optimization

Downloads 17.51k

Release Time : 12/26/2022

Model Overview

Adopts a unified paradigm to handle instance/semantic/panoptic segmentation tasks by predicting mask groups and corresponding labels, with improved performance and efficiency compared to its predecessor MaskFormer

Model Features

Unified Segmentation Architecture

Treats instance/semantic/panoptic segmentation uniformly as instance segmentation, simplifying the task flow

Multi-scale Deformable Attention

Replaces traditional pixel decoders to improve feature extraction efficiency

Masked Attention Mechanism

Enhances model performance without increasing computational load

Efficient Training Strategy

Significantly reduces training resource consumption by computing loss from sampled points rather than entire masks

Model Capabilities

Image Instance Segmentation

Object Mask Prediction

Multi-category Object Recognition

Use Cases

Computer Vision

Object Recognition and Segmentation

Identifies objects in images and generates precise segmentation masks

Achieves SOTA performance on the COCO dataset

Scene Understanding

Analyzes object distribution and spatial relationships in complex scenes

🚀 Mask2Former

The Mask2Former model is trained on COCO instance segmentation (small - sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model trained on COCO instance segmentation (small - sized version, Swin backbone) is introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Addresses instance, semantic and panoptic segmentation with the same paradigm by predicting a set of masks and corresponding labels.
Outperforms the previous SOTA, MaskFormer in terms of performance and efficiency through several improvements:
- Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopts a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on COCO instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-small-coco-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-small-coco-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

Advanced Usage

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご