đ Mask2Former
The Mask2Former model is trained on ADE20k semantic segmentation (large - sized version, Swin backbone). It offers a unified approach to address multiple image segmentation tasks.
đ Quick Start
The Mask2Former model is designed to handle instance, semantic, and panoptic segmentation within the same framework. It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and initially released in this repository.
⨠Features
- Unified Segmentation Paradigm: Addresses instance, semantic, and panoptic segmentation by predicting a set of masks and corresponding labels, treating all three tasks as instance segmentation.
- Performance and Efficiency: Outperforms the previous SOTA, MaskFormer, through several improvements:
- Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopts a Transformer decoder with masked attention to enhance performance without extra computation.
- Improves training efficiency by calculating the loss on subsampled points instead of whole masks.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-ade-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-ade-semantic")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
Advanced Usage
For more code examples, refer to the documentation.
đ Documentation
Model description
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

Intended uses & limitations
You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.
đ§ Technical Details
No specific technical details beyond the model description are provided in the original document.
đ License
The license for this model is "other".
Property |
Details |
Model Type |
Mask2Former model trained on ADE20k semantic segmentation (large - sized version, Swin backbone) |
Training Data |
coco |