Mask2Former Swin-Base ADE Semantic Open-Source Image Segmentation Model - General Processing for Instance, Semantic, and Panoptic Segmentation Tasks

Mask2former Swin Base Ade Semantic

Developed by facebook

A general-purpose image segmentation model trained on the ADE20k dataset, using a unified framework to handle instance/semantic/panoptic segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Panoptic Segmentation

Downloads 2,811

Release Time : 1/5/2023

Model Overview

Mask2Former is a Transformer-based general-purpose image segmentation model that unifies instance segmentation, semantic segmentation, and panoptic segmentation by predicting a set of masks and their corresponding labels. Compared to its predecessor MaskFormer, it shows significant improvements in both performance and efficiency.

Model Features

Unified Segmentation Framework

Treats instance segmentation, semantic segmentation, and panoptic segmentation uniformly as instance segmentation tasks

Efficient Attention Mechanism

Uses multi-scale deformable attention Transformer to replace traditional pixel decoders

Masked Attention Decoder

Introduces Transformer decoder with masked attention to improve performance without increasing computational cost

Efficient Training Strategy

Significantly improves training efficiency by computing loss on sampled points rather than entire masks

Model Capabilities

Instance Segmentation

Semantic Segmentation

Panoptic Segmentation

Multi-scale Image Analysis

Use Cases

Computer Vision

Scene Understanding

Accurate segmentation and classification of objects in complex scenes

Can recognize 150 semantic labels from the ADE20k dataset

Autonomous Driving

Real-time semantic segmentation of road scenes

🚀 Mask2Former

The Mask2Former model is trained on ADE20k semantic segmentation (base-sized version, Swin backbone). It offers a unified approach for instance, semantic, and panoptic segmentation.

🚀 Quick Start

The Mask2Former model trained on ADE20k semantic segmentation (base-sized version, Swin backbone) was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and first released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Unified Segmentation Paradigm: Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm by predicting a set of masks and corresponding labels.
Performance and Efficiency: It outperforms the previous SOTA, MaskFormer, both in terms of performance and efficiency through several key improvements.
- Advanced Transformer: Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Masked Attention: Adopts a Transformer decoder with masked attention to boost performance without introducing additional computation.
- Efficient Training: Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on ADE20k semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-ade-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-ade-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

Advanced Usage

For more code examples, we refer to the documentation.

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco
Widget Examples	src: http://images.cocodataset.org/val2017/000000039769.jpg, example_title: Cats src: http://images.cocodataset.org/val2017/000000039770.jpg, example_title: Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご