Mask2Former-Swin-Small-ADE-Semantic Open-Source Model - Unified Handling of Image Segmentation Tasks, Small Size and Super Practical

Mask2former Swin Small Ade Semantic

Developed by facebook

Small-sized Mask2Former model for ADE20k semantic segmentation based on Swin backbone network, using a unified paradigm for image segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Mask Prediction

Downloads 3,265

Release Time : 1/5/2023

Model Overview

Mask2Former is an advanced image segmentation model that handles instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and their corresponding labels. The model shows significant improvements in both performance and efficiency compared to its predecessors.

Model Features

Unified Segmentation Paradigm

Treats instance segmentation, semantic segmentation, and panoptic segmentation uniformly as instance segmentation, simplifying the task workflow

Efficient Attention Mechanism

Utilizes multi-scale deformable attention Transformer and mask attention mechanism to improve performance without increasing computational cost

Efficient Training Method

Significantly enhances training efficiency by computing loss on sampled points rather than entire masks

Model Capabilities

Image Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Use Cases

Computer Vision

Scene Understanding

Accurate segmentation and classification of objects in complex scenes

Can accurately identify and segment 150 object categories in the ADE20k dataset

Autonomous Driving

Road scene parsing, identifying vehicles, pedestrians, roads, etc.

🚀 Mask2Former

The Mask2Former model is designed for image segmentation tasks, offering a unified approach to handle instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model, trained on ADE20k semantic segmentation (small - sized version, Swin backbone), was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Paradigm: Mask2Former addresses instance, semantic, and panoptic segmentation using the same paradigm by predicting a set of masks and corresponding labels, treating all three tasks as instance segmentation.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, in terms of both performance and efficiency. This is achieved by:
- Replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopting a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on ADE20k semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-small-ade-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-small-ade-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

For more code examples, we refer to the documentation.

📄 License

The license for this model is "other".

Property	Details
Model Type	Mask2Former model trained on ADE20k semantic segmentation (small - sized version, Swin backbone)
Training Data	ADE20k, COCO
Tags	vision, image - segmentation
Example Images	Cats, Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご