Open-source Mask2Former Image Segmentation Model - Freely Supports Instance/Semantic/Panoptic Segmentation Tasks

Mask2former Swin Tiny Coco Panoptic

Developed by facebook

Mask2Former is a Transformer-based unified image segmentation model supporting instance segmentation, semantic segmentation, and panoptic segmentation tasks, utilizing masked attention mechanism to enhance performance

Image Segmentation

Transformers

Open Source License:Other #Panoptic Segmentation #Unified Multi-task Architecture #Swin Backbone Network

Downloads 4,538

Release Time : 1/2/2023

Model Overview

Mask2Former adopts a unified paradigm for image segmentation tasks, achieving instance/semantic/panoptic segmentation by predicting a set of masks and corresponding labels. Compared to previous models, its innovation lies in multi-scale deformable attention mechanism and masked-attention decoder

Model Features

Unified Segmentation Architecture

Unifies instance/semantic/panoptic segmentation as mask prediction problems, simplifying task processing workflow

Masked Attention Mechanism

Employs masked-attention Transformer decoder to improve performance without increasing computational cost

Efficient Training Strategy

Significantly improves training efficiency by calculating loss through sampled points rather than entire masks

Model Capabilities

Image Segmentation

Instance Segmentation

Semantic Segmentation

Panoptic Segmentation

Use Cases

Computer Vision

Scene Understanding

Pixel-level identification and classification of objects in complex scenes

Can output segmentation masks with semantic labels

Autonomous Driving

Identifying various objects and drivable areas in road scenes

## 🚀 Mask2Former

*Mask2Former is a model trained on COCO panoptic segmentation (tiny-sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.*

## 🚀 Quick Start
Mask2Former is a powerful model for image segmentation tasks. It can handle instance, semantic, and panoptic segmentation using a single paradigm. You can use the provided code examples to quickly start with this model.

## ✨ Features
- **Unified Segmentation Paradigm**: Addresses instance, semantic, and panoptic segmentation with the same approach.
- **High Performance**: Outperforms the previous SOTA, MaskFormer, in terms of both performance and efficiency.
- **Advanced Architecture**: Utilizes a multi - scale deformable attention Transformer and a masked attention Transformer decoder.

## 📚 Documentation
### Model description
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, 
[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/mask2former_architecture.png)

### Intended uses & limitations
You can use this particular checkpoint for panoptic segmentation. See the [model hub](https://huggingface.co/models?search=mask2former) to look for other
fine - tuned versions on a task that interests you.

## 💻 Usage Examples
### Basic Usage
```python
import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

Advanced Usage

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco


Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご