Mask2Former Open-source Image Segmentation Model - Free Deployment, Accurately Complete Cityscapes Instance Segmentation Tasks

Mask2former Swin Tiny Cityscapes Instance

Developed by facebook

Mask2Former is a general-purpose image segmentation model based on Transformer architecture, this version is specifically fine-tuned for instance segmentation tasks on the Cityscapes dataset

Image Segmentation

Transformers

Open Source License:Other #Instance Segmentation #Multi-scale Attention #Swin Backbone Network

Downloads 67

Release Time : 1/5/2023

Model Overview

This model adopts a unified paradigm for image segmentation tasks, achieving instance segmentation by predicting a set of masks and corresponding labels, with improvements in both performance and efficiency compared to previous models

Model Features

Unified Segmentation Architecture

Adopts a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks, treating all three as instance segmentation

Efficient Attention Mechanism

Uses a multi-scale deformable attention Transformer to replace traditional pixel decoders, improving computational efficiency

Masked Attention Decoder

Employs a Transformer decoder with masked attention to enhance performance without increasing computational load

Efficient Training Strategy

Significantly improves training efficiency by computing losses on sampled points rather than entire masks

Model Capabilities

Image Instance Segmentation

Multi-object Detection and Segmentation

Scene Understanding

Use Cases

Autonomous Driving

Road Scene Analysis

Identify and segment elements such as vehicles, pedestrians, and traffic signs on the road

Can be used to build high-precision environmental perception systems

Urban Management

Urban Infrastructure Monitoring

Automatically identify and segment urban elements such as buildings, roads, and green belts

Assists in urban planning and management decisions

🚀 Mask2Former

The Mask2Former model is trained on Cityscapes instance segmentation (tiny - sized version, Swin backbone). It offers a unified approach for image segmentation tasks.

🚀 Quick Start

The Mask2Former model is designed to handle instance, semantic, and panoptic segmentation. It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

✨ Features

Unified Paradigm: Addresses instance, semantic, and panoptic segmentation using the same approach by predicting a set of masks and corresponding labels.
Performance and Efficiency: Outperforms the previous SOTA MaskFormer through several improvements:
- Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopts a Transformer decoder with masked attention to boost performance without extra computation.
- Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-cityscapes-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-cityscapes-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

The license for this model is other.

Property	Details
Model Type	Mask2Former model trained on Cityscapes instance segmentation (tiny - sized version, Swin backbone)
Training Data	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご