đ Mask2Former
The Mask2Former model is trained on Cityscapes instance segmentation (large-sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.
đ Quick Start
The Mask2Former model trained on Cityscapes instance segmentation (large - sized version, Swin backbone) was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.
Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
- Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach, treating all three tasks as instance segmentation by predicting a set of masks and corresponding labels.
- Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, in both performance and efficiency through several key improvements:
- Advanced Attention Mechanism: Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Masked Attention Decoder: Adopts a Transformer decoder with masked attention to boost performance without additional computation.
- Efficient Training: Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-cityscapes-instance")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
predicted_instance_map = result["segmentation"]
Advanced Usage
No advanced usage code examples are provided in the original document, so this part is skipped.
đ Documentation
Intended uses & limitations
You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.
How to use
For more code examples, refer to the documentation.
đ§ Technical Details
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. It outperforms MaskFormer through improvements such as replacing the pixel decoder, adopting a masked - attention Transformer decoder, and improving training efficiency by calculating loss on subsampled points.
đ License
The license for this model is "other".
Property |
Details |
Model Type |
Mask2Former model trained on Cityscapes instance segmentation (large - sized version, Swin backbone) |
Training Data |
Cityscapes, COCO |