M

Mask2former Swin Large Mapillary Vistas Panoptic

Developed by facebook
Large-scale Mask2Former version based on Swin backbone network, specifically designed for panoptic segmentation tasks, trained on the Mapillary Vistas dataset
Downloads 2,750
Release Time : 1/5/2023

Model Overview

Mask2Former is a unified image segmentation framework that handles instance segmentation, semantic segmentation, and panoptic segmentation tasks by predicting a set of masks and corresponding labels. Compared to its predecessor MaskFormer, it shows significant improvements in both performance and efficiency.

Model Features

Unified segmentation framework
Unifies instance segmentation, semantic segmentation, and panoptic segmentation as instance segmentation tasks
Multi-scale deformable attention
Uses multi-scale deformable attention Transformer to upgrade the pixel decoder, improving performance
Masked attention mechanism
Introduces a Transformer decoder with masked attention mechanism, enhancing performance with zero computational overhead
Efficient training
Significantly improves training efficiency by calculating loss values through subsampled points

Model Capabilities

Image segmentation
Panoptic segmentation
Instance segmentation
Semantic segmentation

Use Cases

Computer vision
Street scene understanding
Used for panoptic segmentation in street scene datasets like Mapillary Vistas
Accurately identifies and segments various objects in street scenes
Object recognition and segmentation
Identifies objects in images and generates precise masks
As shown in examples like cat and castle recognition
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase