mask2former-swin-small-cityscapes-panoptic Open-source Model - Optimizing the Cityscapes Panoptic Segmentation Task

Mask2former Swin Small Cityscapes Panoptic

Developed by facebook

A compact Mask2Former model based on Swin backbone network, optimized for panoptic segmentation tasks on the Cityscapes dataset

Image Segmentation

Transformers

Open Source License:Other #Panoptic Segmentation #Unified Multi-task Framework #Swin Backbone Network

Downloads 568

Release Time : 1/3/2023

Model Overview

Mask2Former is a universal image segmentation framework that unifies instance segmentation, semantic segmentation, and panoptic segmentation through predicting a set of masks and corresponding labels. This specific checkpoint is fine-tuned for urban street scene panoptic segmentation.

Model Features

Unified Segmentation Framework

Unifies instance segmentation, semantic segmentation, and panoptic segmentation into mask prediction tasks, simplifying the processing pipeline

Efficient Attention Mechanism

Uses multi-scale deformable attention Transformer to replace traditional pixel decoders, improving computational efficiency

Masked Attention Decoder

Innovatively introduces Transformer decoder with masked attention to enhance performance without increasing computational load

Efficient Training Strategy

Significantly reduces training computational resource consumption by calculating loss through sampled points rather than entire masks

Model Capabilities

Image Segmentation

Street Scene Understanding

Object Recognition and Localization

Panoptic Segmentation

Use Cases

Intelligent Transportation Systems

Street Scene Element Analysis

Accurately segments and classifies vehicles, pedestrians, traffic signs, etc. in urban roads

Can be used for traffic flow monitoring and urban planning

Autonomous Driving

Environmental Perception

Real-time identification and segmentation of various objects in road scenes

Provides precise environmental understanding for autonomous driving systems

🚀 Mask2Former

The Mask2Former model is designed for image segmentation tasks, offering a unified approach to handle instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model trained on Cityscapes panoptic segmentation (small - sized version, Swin backbone). It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. All 3 tasks are treated as if they were instance segmentation.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer both in terms of performance and efficiency.
- Advanced Attention Mechanism: Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Masked Attention Decoder: Adopts a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improved Training Efficiency: Calculates the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-small-cityscapes-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-small-cityscapes-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご