Open-source image segmentation model mask2former-swin-large-cityscapes-instance

Mask2former Swin Large Cityscapes Instance

Developed by facebook

A unified image segmentation model based on the Swin large backbone architecture, supporting instance/semantic/panoramic segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified segmentation framework #Multi-scale attention #Instance segmentation optimization

Downloads 1,248

Release Time : 1/5/2023

Model Overview

Mask2Former is a unified image segmentation model using the Transformer architecture, which achieves unified processing of three major tasks: instance segmentation, semantic segmentation, and panoramic segmentation by predicting masks and corresponding labels.

Model Features

Unified segmentation framework

Unify instance segmentation, semantic segmentation, and panoramic segmentation as mask prediction problems

Multi-scale deformable attention

The pixel decoder uses a multi-scale deformable attention mechanism to improve feature extraction ability

Mask attention decoder

Innovatively introduce a Transformer decoder with mask attention to improve performance without increasing computational complexity

Efficient training strategy

Calculate the loss value through subsampled points to significantly improve training efficiency

Model Capabilities

Instance segmentation

Semantic segmentation

Panoramic segmentation

Image scene understanding

Use Cases

Autonomous driving

Road scene analysis

Identify instances such as vehicles, pedestrians, and traffic signs in urban roads

Achieve SOTA performance on the Cityscapes dataset

Medical imaging

Organ segmentation

Identify specific organs or lesion areas in medical images

🚀 Mask2Former

The Mask2Former model is trained on Cityscapes instance segmentation (large-sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model trained on Cityscapes instance segmentation (large - sized version, Swin backbone) was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach, treating all three tasks as instance segmentation by predicting a set of masks and corresponding labels.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, in both performance and efficiency through several key improvements:
- Advanced Attention Mechanism: Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Masked Attention Decoder: Adopts a Transformer decoder with masked attention to boost performance without additional computation.
- Efficient Training: Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-cityscapes-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

Advanced Usage

No advanced usage code examples are provided in the original document, so this part is skipped.

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

For more code examples, refer to the documentation.

🔧 Technical Details

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. It outperforms MaskFormer through improvements such as replacing the pixel decoder, adopting a masked - attention Transformer decoder, and improving training efficiency by calculating loss on subsampled points.

📄 License

The license for this model is "other".

Property	Details
Model Type	Mask2Former model trained on Cityscapes instance segmentation (large - sized version, Swin backbone)
Training Data	Cityscapes, COCO

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご