đ Mask2Former
The Mask2Former model, trained on COCO instance segmentation (tiny-sized version, Swin backbone), offers a unified approach for various image segmentation tasks.
đ Quick Start
The Mask2Former model is trained on COCO instance segmentation. It can be used for instance segmentation tasks. You can find other fine - tuned versions on the model hub.
⨠Features
- Unified Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach, treating all three tasks as instance segmentation by predicting a set of masks and corresponding labels.
- Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in performance and efficiency. This is achieved through:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.
đ Documentation
Model description
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

Intended uses & limitations
You can use this particular checkpoint for instance segmentation. See the model hub to look for other
fine - tuned versions on a task that interests you.
đģ Usage Examples
Basic Usage
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-coco-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-coco-instance")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
predicted_instance_map = result["segmentation"]
For more code examples, we refer to the documentation.
đ License
The license for this model is other
.
Additional Information
Property |
Details |
Tags |
vision, image - segmentation |
Datasets |
coco |
Widget Examples |
- src: Cats - src: Castle |
Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team. The paper introducing the model is Masked - attention Mask Transformer for Universal Image Segmentation, and it was first released in this repository.