Mask2Former Open-Source Image Segmentation Model - Free Deployment for Instance Segmentation Tasks on COCO Dataset

Mask2former Swin Base IN21k Coco Instance

Developed by facebook

Mask2Former is a Transformer-based universal image segmentation model, fine-tuned on the COCO dataset for instance segmentation tasks

Image Segmentation

Transformers

Open Source License:Other #Unified Image Segmentation #Multi-scale Attention #Instance Segmentation

Downloads 26

Release Time : 1/16/2023

Model Overview

Adopts a unified architecture to handle instance/semantic/panoptic segmentation tasks, achieving high-performance segmentation through predicting mask groups and their corresponding labels

Model Features

Unified Segmentation Architecture

Uses the same model architecture to handle three segmentation tasks: instance, semantic, and panoptic

Mask Attention Mechanism

Innovative mask attention Transformer decoder improves performance without increasing computational cost

Efficient Training Strategy

Significantly enhances training efficiency by computing loss through sampled points rather than entire masks

Model Capabilities

Image Instance Segmentation

Multi-object Recognition and Segmentation

Complex Scene Parsing

Use Cases

Computer Vision

Object Instance Segmentation

Accurately segments each object instance in an image

Achieves state-of-the-art performance on the COCO dataset

Scene Understanding

Parses objects and their spatial relationships in complex scenes

🚀 Mask2Former

The Mask2Former model is trained on COCO instance segmentation (base-sized IN21k version, Swin backbone). It offers a unified approach for various image segmentation tasks.

🚀 Quick Start

The Mask2Former model is designed to handle instance, semantic, and panoptic segmentation using a single paradigm. It was introduced in the paper Masked-attention Mask Transformer for Universal Image Segmentation and initially released in this repository.

Disclaimer: The team releasing Mask2Former did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Unified Paradigm: Addresses instance, semantic, and panoptic segmentation by predicting a set of masks and corresponding labels, treating all three tasks as instance segmentation.
Performance and Efficiency: Outperforms the previous SOTA, MaskFormer, in both performance and efficiency through several key improvements:
- Replaces the pixel decoder with a more advanced multi - scale deformable attention Transformer.
- Adopts a Transformer decoder with masked attention to boost performance without additional computation.
- Improves training efficiency by calculating the loss on subsampled points instead of whole masks.

model image

📚 Documentation

Intended uses & limitations

You can use this particular checkpoint for instance segmentation. Check the model hub to find other fine - tuned versions for tasks that interest you.

How to use

Here is how to use this model:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO instance segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-coco-instance")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-coco-instance")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_instance_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_instance_map = result["segmentation"]

For more code examples, refer to the documentation.

📄 License

License: other

Property	Details
Tags	vision, image - segmentation
Datasets	coco

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご