Open-source model of mask2former-swin-large-mapillary-vistas-panoptic - A practical tool for panoptic segmentation tasks

Mask2former Swin Large Mapillary Vistas Panoptic

Developed by facebook

Large-scale Mask2Former version based on Swin backbone network, specifically designed for panoptic segmentation tasks, trained on the Mapillary Vistas dataset

Image Segmentation

Transformers

Open Source License:Other #Panoptic segmentation #Unified multi-task framework #Swin backbone network

Downloads 2,750

Release Time : 1/5/2023

Model Overview

Mask2Former is a unified image segmentation framework that handles instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and corresponding labels. Compared to its predecessor MaskFormer, it shows significant improvements in both performance and efficiency.

Model Features

Unified segmentation framework

Unifies instance segmentation, semantic segmentation, and panoptic segmentation as instance segmentation tasks

Multi-scale deformable attention

Uses multi-scale deformable attention Transformer to upgrade the pixel decoder, improving performance

Masked attention mechanism

Introduces a Transformer decoder with masked attention mechanism, enhancing performance with zero computational overhead

Efficient training

Significantly improves training efficiency by calculating loss values through subsampled points

Model Capabilities

Image segmentation

Panoptic segmentation

Instance segmentation

Semantic segmentation

Use Cases

Computer vision

Street scene understanding

Used for panoptic segmentation in street scene datasets like Mapillary Vistas

Accurately identifies and segments various objects in street scenes

Object recognition and segmentation

Identifies objects in images and generates precise masks

As shown in examples like cat and castle recognition

🚀 Mask2Former

The Mask2Former model is trained on Mapillary Vistas panoptic segmentation (large - sized version, Swin backbone). It offers a unified solution for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model, trained on Mapillary Vistas panoptic segmentation (large - sized version with a Swin backbone), was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same approach, by predicting a set of masks and corresponding labels.
Performance Improvement: It outperforms the previous SOTA, MaskFormer, in terms of both performance and efficiency. It does so by replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, adopting a Transformer decoder with masked attention, and improving training efficiency by calculating the loss on subsampled points.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. Check the model hub to find other fine - tuned versions for tasks that interest you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Mapillary Vistas panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

For more code examples, refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご