Open-source Mask2Former Image Segmentation Model - Master Instance, Semantic, and Panoptic Segmentation Tasks

Mask2former Swin Base IN21k Ade Semantic

Developed by facebook

Mask2Former is a universal image segmentation model capable of handling instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and their corresponding labels.

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Masked Attention Transformer #Multi-scale Deformable Attention

Downloads 879

Release Time : 1/5/2023

Model Overview

This model adopts the Swin backbone network and is fine-tuned on the ADE20k dataset for semantic segmentation tasks, providing efficient and accurate segmentation capabilities through an improved Transformer architecture.

Model Features

Unified Segmentation Architecture

Handles instance segmentation, semantic segmentation, and panoptic segmentation tasks with a single model architecture.

Improved Transformer Design

Utilizes multi-scale deformable attention Transformer and masked attention Transformer decoder to enhance performance and efficiency.

Efficient Training Method

Significantly improves training efficiency by computing loss through sampled points rather than entire masks.

Model Capabilities

Image Semantic Segmentation

Image Instance Segmentation

Image Panoptic Segmentation

Multi-scale Image Analysis

Use Cases

Computer Vision

Scene Understanding

Identify and segment different objects in complex scenes.

Accurately identify and segment various objects in the scene.

Autonomous Driving

Analyze road scenes to identify vehicles, pedestrians, road signs, etc.

Provide precise environmental perception for autonomous driving systems.

Medical Imaging

Medical Image Analysis

Segment organs or lesion areas in medical images.

Assist doctors in diagnosis and treatment planning.

🚀 Mask2Former

The Mask2Former model is trained on ADE20k semantic segmentation (base - IN21k version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model, introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository, can be used for various image segmentation tasks.

✨ Features

Unified Paradigm: Addresses instance, semantic, and panoptic segmentation using the same approach by predicting a set of masks and corresponding labels.
Performance and Efficiency: Outperforms the previous SOTA MaskFormer through advanced techniques like multi - scale deformable attention Transformer, masked attention in the decoder, and subsampled point loss calculation.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on ADE20k semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

Advanced Usage

For more code examples, refer to the documentation.

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

📄 License

The license for this model is "other".

Property	Details
Model Type	Mask2Former model trained on ADE20k semantic segmentation (base - IN21k version, Swin backbone)
Training Data	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご