Mask2Former-Swin-Base-COCO-Panoptic Open-Source Model: A Powerful Tool for Panoptic Segmentation Supporting Multi-Class Segmentation Tasks

Mask2former Swin Base Coco Panoptic

Developed by facebook

The Mask2Former model based on the Swin backbone network, trained on the COCO panoptic segmentation dataset, adopts a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Panoptic Segmentation

Downloads 45.01k

Release Time : 1/2/2023

Model Overview

Mask2Former is a universal image segmentation model that unifies instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and their corresponding labels. It achieves breakthroughs in both performance and efficiency compared to previous models.

Model Features

Unified Segmentation Paradigm

Unifies instance segmentation, semantic segmentation, and panoptic segmentation as mask prediction problems, simplifying the task processing workflow.

Multi-scale Deformable Attention

Upgrades the pixel decoder with a multi-scale deformable attention mechanism to enhance feature extraction capabilities.

Masked Attention Decoder

Employs a transformer decoder with masked attention to improve model performance at zero computational cost.

Efficient Training Strategy

Significantly enhances training efficiency by computing losses on subsampled points rather than full masks.

Model Capabilities

Image Segmentation

Instance Segmentation

Semantic Segmentation

Panoptic Segmentation

Use Cases

Computer Vision

Scene Understanding

Accurately segments and classifies objects in complex scenes

Simultaneously identifies object instances and semantic categories

Autonomous Driving

Parses road scenes to identify vehicles, pedestrians, roads, and other elements

Provides precise object boundaries and category information

🚀 Mask2Former

The Mask2Former model is trained on COCO panoptic segmentation (base-sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model trained on COCO panoptic segmentation (base - sized version, Swin backbone) was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same paradigm by predicting a set of masks and corresponding labels, treating all three tasks as instance segmentation.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, in terms of both performance and efficiency. It achieves this by:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

License: other

Information Table

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	- src: Cats - src: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご