Mask2Former Open-source Image Segmentation Model - Free Deployment for Precise Image Segmentation with Outstanding Performance

Mask2former Swin Small Cityscapes Instance

Developed by facebook

Mask2Former is a unified image segmentation model based on Transformer, using mask attention mechanism to improve performance

Image Segmentation

Transformers

Open Source License:Other #Instance Segmentation #Swin Backbone Network #Multi-scale Attention

Downloads 43

Release Time : 1/5/2023

Model Overview

This model is a small version of Mask2Former, using Swin Transformer as the backbone network, specifically fine-tuned for instance segmentation tasks on the Cityscapes dataset. It employs a unified architecture to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks.

Model Features

Unified Segmentation Architecture

Uses a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks

Mask Attention Mechanism

Introduces a Transformer decoder with masked attention, improving performance without increasing computational cost

Efficient Training Strategy

Calculates loss through sampled points rather than entire masks, significantly improving training efficiency

Model Capabilities

Image Instance Segmentation

Multi-scale Feature Extraction

High-precision Object Boundary Recognition

Use Cases

Autonomous Driving

Street Scene Object Recognition

Identifies instances such as vehicles and pedestrians in urban street scenes

Performs excellently on the Cityscapes dataset

Smart Surveillance

Scene Analysis

Performs precise segmentation and recognition of objects in surveillance footage

🚀 Mask2Former

The Mask2Former model is designed for image segmentation tasks, offering a unified approach to handle instance, semantic, and panoptic segmentation. It provides efficient and high - performance solutions for various vision - related applications.

🚀 Quick Start

The Mask2Former model was trained on Cityscapes instance segmentation (small - sized version, Swin backbone). It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance and efficiency by:

Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
Adopting a Transformer decoder with masked attention to boost performance without introducing additional computation.
Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-small-cityscapes-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-small-cityscapes-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget - Example 1	src: http://images.cocodataset.org/val2017/000000039769.jpg, example_title: Cats
Widget - Example 2	src: http://images.cocodataset.org/val2017/000000039770.jpg, example_title: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご