Mask2Former Open-Source Instance Segmentation Model - Unified Handling of Segmentation Tasks Based on COCO Dataset!

Mask2former Swin Tiny Coco Instance

Developed by facebook

A mini version of the Mask2Former instance segmentation model trained on the COCO dataset, utilizing the Swin backbone network to handle segmentation tasks uniformly

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #COCO Instance Segmentation

Downloads 149.85k

Release Time : 12/23/2022

Model Overview

Mask2Former is a universal image segmentation model that processes instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and corresponding labels. It offers improvements in both performance and efficiency compared to previous models.

Model Features

Unified Segmentation Paradigm

Treats instance segmentation, semantic segmentation, and panoptic segmentation uniformly as instance segmentation tasks

Efficient Attention Mechanism

Uses a multi-scale deformable attention Transformer to replace traditional pixel decoders

Masked Attention Decoder

Introduces a Transformer decoder with masked attention to enhance performance without increasing computational load

Efficient Training Method

Calculates loss via sampled points rather than entire masks, significantly improving training efficiency

Model Capabilities

Image Segmentation

Instance Recognition

Object Mask Generation

Use Cases

Computer Vision

Object Recognition and Segmentation

Identifies objects in images and generates precise pixel-level segmentation masks

Achieves high-precision instance segmentation on the COCO dataset

Scene Understanding

Analyzes multiple objects and their spatial relationships in complex scenes

🚀 Mask2Former

The Mask2Former model, trained on COCO instance segmentation (tiny-sized version, Swin backbone), offers a unified approach for various image segmentation tasks.

🚀 Quick Start

The Mask2Former model is trained on COCO instance segmentation. It can be used for instance segmentation tasks. You can find other fine - tuned versions on the model hub.

✨ Features

Unified Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach, treating all three tasks as instance segmentation by predicting a set of masks and corresponding labels.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in performance and efficiency. This is achieved through:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-coco-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-coco-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

The license for this model is other.

Additional Information

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	- src: Cats - src: Castle

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team. The paper introducing the model is Masked - attention Mask Transformer for Universal Image Segmentation, and it was first released in this repository.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご