Open-source Mask2Former-Swin-Large-ADE Semantic Segmentation Model - Unified Processing of Image Segmentation Tasks

Mask2former Swin Large Ade Semantic

Developed by facebook

A large-scale version based on the Swin backbone network, trained on the ADE20k semantic segmentation dataset, employing a unified paradigm for image segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Panoptic Segmentation

Downloads 238.92k

Release Time : 1/5/2023

Model Overview

Mask2Former is an advanced image segmentation model capable of handling instance segmentation, semantic segmentation, and panoptic segmentation tasks. It unifies the processing of different types of segmentation tasks by predicting a set of masks and their corresponding labels.

Model Features

Unified Segmentation Paradigm

Unifies instance segmentation, semantic segmentation, and panoptic segmentation as instance segmentation tasks.

Efficient Attention Mechanism

Uses multi-scale deformable attention Transformer to replace traditional pixel decoders.

Masked Attention Decoder

Introduces a Transformer decoder with masked attention to improve performance without increasing computational load.

Efficient Training Method

Significantly improves training efficiency by calculating loss via sampled points rather than entire masks.

Model Capabilities

Image Semantic Segmentation

Instance Segmentation

Panoptic Segmentation

Multi-scale Feature Extraction

Use Cases

Computer Vision

Scene Understanding

Accurate segmentation and classification of objects in complex scenes.

Achieves SOTA performance on standard datasets like ADE20k.

Autonomous Driving

Recognition and segmentation of various objects in road scenes.

Medical Image Analysis

Segmentation of organs or lesion areas in medical images.

🚀 Mask2Former

The Mask2Former model is trained on ADE20k semantic segmentation (large - sized version, Swin backbone). It offers a unified approach to address multiple image segmentation tasks.

🚀 Quick Start

The Mask2Former model is designed to handle instance, semantic, and panoptic segmentation within the same framework. It was introduced in the paper Masked - attention Mask Transformer for Universal Image Segmentation and initially released in this repository.

✨ Features

Unified Segmentation Paradigm: Addresses instance, semantic, and panoptic segmentation by predicting a set of masks and corresponding labels, treating all three tasks as instance segmentation.
Performance and Efficiency: Outperforms the previous SOTA, MaskFormer, through several improvements:
- Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopts a Transformer decoder with masked attention to enhance performance without extra computation.
- Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine - tuned on ADE20k semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-ade-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-ade-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

Advanced Usage

For more code examples, refer to the documentation.

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for panoptic segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

🔧 Technical Details

No specific technical details beyond the model description are provided in the original document.

📄 License

The license for this model is "other".

Property	Details
Model Type	Mask2Former model trained on ADE20k semantic segmentation (large - sized version, Swin backbone)
Training Data	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご