Open-source Mask2Former-Swin-Large Model - Unified Processing of Image Instance, Semantic, and Panoptic Segmentation

Mask2former Swin Large Mapillary Vistas Semantic

Developed by facebook

A large-scale Mask2Former model based on the Swin backbone network, designed for general image segmentation tasks, unifying instance segmentation, semantic segmentation, and panoptic segmentation.

Image Segmentation

Transformers

Open Source License:Other #Panoptic Segmentation #Unified Multi-task Framework #Swin Backbone Network

Downloads 5,539

Release Time : 1/5/2023

Model Overview

Mask2Former is an advanced image segmentation model that addresses instance segmentation, semantic segmentation, and panoptic segmentation tasks in a unified manner by predicting a set of masks and their corresponding labels. Compared to previous models, it offers significant improvements in both performance and efficiency.

Model Features

Unified Segmentation Framework

Unifies instance segmentation, semantic segmentation, and panoptic segmentation as a mask prediction problem, simplifying task processing.

Efficient Attention Mechanism

Uses a multi-scale deformable attention Transformer to replace traditional pixel decoders, improving computational efficiency.

Masked Attention Decoder

Introduces a Transformer decoder with masked attention, enhancing performance without increasing computational load.

Efficient Training Strategy

Calculates loss based on sampled points rather than full masks, significantly improving training efficiency.

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Image Understanding

Scene Parsing

Use Cases

Autonomous Driving

Road Scene Understanding

Identifies and segments various elements in road scenes (vehicles, pedestrians, traffic signs, etc.)

Provides precise segmentation of scene elements to support autonomous driving decisions.

Remote Sensing Image Analysis

Land Cover Classification

Segments and classifies different land cover types in satellite or aerial images.

Accurately identifies and segments various land cover types, supporting land use analysis.

Medical Imaging

Organ Segmentation

Segments specific organs or lesion areas in medical images.

Provides precise organ boundary identification to assist in diagnosis and treatment.

🚀 Mask2Former

The Mask2Former model is trained on Mapillary Vistas semantic segmentation (large - sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model is trained on Mapillary Vistas semantic segmentation (large - sized version, Swin backbone). It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach by predicting a set of masks and corresponding labels.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, in both performance and efficiency through several improvements:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on Mapillary Vistas semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	- src: Cats - src: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご