Mask2Former-Swin-Large-COCO-Panoptic Open-Source Image Segmentation Model - Suitable for COCO Panoptic Segmentation Task

Mask2former Swin Large Coco Panoptic

Developed by facebook

A large-scale version of Mask2Former based on the Swin backbone network, specifically trained for panoptic segmentation tasks on the COCO dataset

Image Segmentation

Transformers

Open Source License:Other #Panoptic Segmentation #Unified Multi-task Framework #Swin Backbone Network

Downloads 37.67k

Release Time : 1/2/2023

Model Overview

Mask2Former is a unified image segmentation framework that handles instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and their corresponding labels. Compared to its predecessor MaskFormer, it shows significant improvements in both performance and efficiency.

Model Features

Unified Segmentation Framework

Treats instance segmentation, semantic segmentation, and panoptic segmentation uniformly as mask prediction problems, simplifying the task processing pipeline

Multi-scale Deformable Attention

Uses advanced multi-scale deformable attention Transformer to replace traditional pixel decoders, enhancing feature extraction capabilities

Masked Attention Mechanism

Introduces masked attention in the Transformer decoder, significantly improving performance without increasing computational load

Efficient Training Strategy

Calculates loss through sampled points rather than entire masks, greatly improving training efficiency

Model Capabilities

Image Segmentation

Instance Recognition

Semantic Understanding

Panoptic Scene Parsing

Use Cases

Computer Vision

Autonomous Driving Scene Understanding

Used to identify various objects and their precise boundaries in road scenes

Accurately segments elements such as vehicles, pedestrians, and road signs

Medical Image Analysis

Assists in segmenting organs or lesion areas in medical imaging

Provides precise organ boundary delineation

Remote Sensing Image Analysis

Analyzes the distribution of geographical features in satellite or aerial images

Identifies geographical elements such as buildings, vegetation, and water bodies

🚀 Mask2Former

The Mask2Former model is trained on COCO panoptic segmentation (large - sized version, Swin backbone). It offers a unified solution for image segmentation tasks.

🚀 Quick Start

The Mask2Former model trained on COCO panoptic segmentation (large - sized version, Swin backbone) was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation with the same paradigm by predicting a set of masks and corresponding labels. All three tasks are treated as instance segmentation.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in terms of performance and efficiency. This is achieved by:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご