Mask2Former Open-Source Image Segmentation Model - Free Deployment for Unified Handling of Instance, Semantic, and Panoramic Segmentation

Mask2former Swin Base IN21k Cityscapes Instance

Developed by facebook

Mask2Former is a Transformer-based general-purpose image segmentation model that unifies instance, semantic, and panoptic segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #General Image Segmentation #Masked Attention Mechanism #Multi-scale Deformable Attention

Downloads 53

Release Time : 1/5/2023

Model Overview

This model achieves instance segmentation by predicting a set of masks and their corresponding labels, utilizing a Swin Transformer backbone and fine-tuned on the Cityscapes dataset.

Model Features

Unified Segmentation Architecture

Unifies instance, semantic, and panoptic segmentation as a mask prediction problem.

Efficient Attention Mechanism

Utilizes multi-scale deformable attention and masked attention to improve computational efficiency.

Training Optimization

Enhances training efficiency by computing loss on sampled points rather than entire masks.

Model Capabilities

Image Instance Segmentation

Multi-scale Feature Extraction

Efficient Mask Prediction

Use Cases

Computer Vision

Street Scene Analysis

Performs instance segmentation on objects in street scene datasets like Cityscapes.

Accurately identifies and segments objects such as roads, vehicles, and pedestrians.

Object Recognition

Identifies and segments specific object instances in images.

🚀 Mask2Former

The Mask2Former model is trained on Cityscapes instance segmentation (base - IN21k version, Swin backbone). It offers a unified approach to instance, semantic, and panoptic segmentation, providing high - performance solutions for image segmentation tasks.

🚀 Quick Start

The Mask2Former model is trained on Cityscapes instance segmentation (base - IN21k version, Swin backbone). It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach by predicting a set of masks and corresponding labels.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in terms of performance and efficiency. It achieves this by replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, adopting a Transformer decoder with masked attention, and improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-cityscapes-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-cityscapes-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

The license for this model is other.

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	src: http://images.cocodataset.org/val2017/000000039769.jpg, example_title: Cats src: http://images.cocodataset.org/val2017/000000039770.jpg, example_title: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご