Open-source Mask2Former model - Handles instance, semantic, and panoptic segmentation tasks, with powerful capabilities

Mask2former Swin Large Ade Panoptic

Developed by facebook

Mask2Former model trained on the ADE20k panoptic segmentation dataset using a Swin large backbone network, employing a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Panoptic Segmentation #Unified Multi-task Architecture #Masked Attention Mechanism

Downloads 2,625

Release Time : 1/5/2023

Model Overview

Mask2Former is a universal image segmentation model that unifies instance segmentation, semantic segmentation, and panoptic segmentation by predicting a set of masks and their corresponding labels, treating all three tasks as instance segmentation problems.

Model Features

Unified Segmentation Paradigm

By predicting a set of masks and their corresponding labels, it unifies instance segmentation, semantic segmentation, and panoptic segmentation as instance segmentation problems.

Multi-scale Deformable Attention

Uses a multi-scale deformable attention Transformer to upgrade the pixel decoder, improving model performance.

Masked Attention Mechanism

Introduces a Transformer decoder with a masked attention mechanism, enhancing performance without increasing computational cost.

Efficient Training

Significantly improves training efficiency by computing losses on subsampled points.

Model Capabilities

Image Segmentation

Instance Segmentation

Semantic Segmentation

Panoptic Segmentation

Use Cases

Computer Vision

Scene Understanding

Used to understand objects and their relationships in complex scenes

Autonomous Driving

Used for object recognition and segmentation in road scenes

🚀 Mask2Former

The Mask2Former model is trained on ADE20k panoptic segmentation (large - sized version, Swin backbone). It offers a unified approach to image segmentation tasks.

🚀 Quick Start

The Mask2Former model is trained on ADE20k panoptic segmentation (large - sized version, Swin backbone). It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on ADE20k panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-ade-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-ade-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

Advanced Usage

For more code examples, we refer to the documentation.

📚 Documentation

No additional detailed documentation is provided in the original document, so this section is skipped.

🔧 Technical Details

No additional technical details are provided in the original document, so this section is skipped.

📄 License

The license for this model is "other".

Property	Details
Model Type	Mask2Former model trained on ADE20k panoptic segmentation (large - sized version, Swin backbone)
Training Data	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご