Mask2Former Open-Source Instance Segmentation Model - Freely Deploy to Accurately Complete Image Instance Segmentation Tasks

Mask2former Swin Large Coco Instance

Developed by facebook

Mask2Former is a Transformer-based unified image segmentation model, utilizing a Swin-Large backbone and fine-tuned on the COCO dataset, specializing in instance segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Instance Segmentation SOTA

Downloads 37.31k

Release Time : 1/2/2023

Model Overview

This model achieves instance segmentation by predicting a set of masks and corresponding labels, employing multi-scale deformable attention mechanisms to enhance performance. It is an improved version of MaskFormer.

Model Features

Unified Segmentation Framework

Handles instance/semantic/panoptic segmentation tasks with the same architecture, simplifying the workflow.

Multi-scale Deformable Attention

Replaces traditional pixel decoders, significantly improving feature extraction efficiency.

Masked Attention Mechanism

Introduces masked attention in the Transformer decoder, enhancing performance without increasing computational burden.

Efficient Training Strategy

Calculates loss via sampled points rather than entire masks, achieving 3x faster training speed.

Model Capabilities

Image Instance Segmentation

Multi-object Detection and Segmentation

Complex Scene Parsing

Use Cases

Computer Vision

Object Instance Segmentation

Generates precise segmentation masks for each object instance in an image.

Achieves SOTA performance on the COCO dataset.

Scene Understanding

Analyzes object distribution and spatial relationships in complex scenes.

🚀 Mask2Former

The Mask2Former model is trained on COCO instance segmentation (large - sized version, Swin backbone). It offers a unified approach to multiple segmentation tasks.

🚀 Quick Start

The Mask2Former model, trained on COCO instance segmentation, is a powerful tool for image segmentation tasks. It was introduced in a research paper and first released in a specific GitHub repository.

✨ Features

Unified Paradigm: Addresses instance, semantic, and panoptic segmentation with the same approach by predicting masks and labels.
Performance Improvement: Outperforms the previous SOTA, MaskFormer, in both performance and efficiency.
Advanced Architecture: Replaces the pixel decoder with a multi - scale deformable attention Transformer and uses a masked - attention Transformer decoder.
Training Efficiency: Calculates loss on subsampled points instead of whole masks.

📚 Documentation

Model description

Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA, MaskFormer both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi - scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. See the model hub to look for other fine - tuned versions on a task that interests you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-coco-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-coco-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, we refer to the documentation.

📄 License

License: other

Property	Details
Model Type	Mask2Former model trained on COCO instance segmentation (large - sized version, Swin backbone)
Training Data	COCO

Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご