Mask2Former-Swin-Base-IN21k-Cityscapes-Semantic Open-Source Image Segmentation Model - Unified Handling of Multiple Types of Segmentation Tasks

Mask2former Swin Base IN21k Cityscapes Semantic

Developed by facebook

A general-purpose image segmentation model based on Swin Transformer, unifying instance/semantic/panoptic segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Mask Prediction

Downloads 329

Release Time : 1/16/2023

Model Overview

Mask2Former is an advanced image segmentation model employing Transformer architecture, achieving unified instance segmentation, semantic segmentation, and panoptic segmentation through predicting a set of masks and corresponding labels.

Model Features

Unified Segmentation Architecture

Uses the same model architecture to handle three segmentation tasks (instance/semantic/panoptic)

Masked Attention Mechanism

Innovative masked-attention Transformer decoder improves performance without increasing computational cost

Efficient Training Strategy

Significantly enhances training efficiency by computing loss via sampled points instead of full mask computation

Multi-scale Feature Processing

Employs deformable attention mechanism to effectively capture multi-scale features

Model Capabilities

Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Multi-scale Image Analysis

Object Recognition and Localization

Use Cases

Autonomous Driving

Street Scene Semantic Segmentation

Identifying key elements like roads, vehicles, and pedestrians

Achieves SOTA performance on Cityscapes dataset

Medical Imaging

Organ Segmentation

Precise segmentation of organ tissues in CT/MRI images

Remote Sensing

Land Cover Classification

Identifying different land types in satellite images

🚀 Mask2Former

The Mask2Former model is trained on Cityscapes semantic segmentation (base - IN21k, Swin backbone). It provides a unified approach for various image segmentation tasks.

🚀 Quick Start

The Mask2Former model was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation with the same paradigm by predicting a set of masks and corresponding labels. All three tasks are treated as instance segmentation.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in terms of performance and efficiency. This is achieved by:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-cityscapes-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-cityscapes-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

For more code examples, we refer to the documentation.

📄 License

License: other

📦 Additional Information

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	- src: Cats - src: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご