M

Mask2former Swin Large Cityscapes Instance

Developed by facebook
A unified image segmentation model based on the Swin large backbone architecture, supporting instance/semantic/panoramic segmentation tasks
Downloads 1,248
Release Time : 1/5/2023

Model Overview

Mask2Former is a unified image segmentation model using the Transformer architecture, which achieves unified processing of three major tasks: instance segmentation, semantic segmentation, and panoramic segmentation by predicting masks and corresponding labels.

Model Features

Unified segmentation framework
Unify instance segmentation, semantic segmentation, and panoramic segmentation as mask prediction problems
Multi-scale deformable attention
The pixel decoder uses a multi-scale deformable attention mechanism to improve feature extraction ability
Mask attention decoder
Innovatively introduce a Transformer decoder with mask attention to improve performance without increasing computational complexity
Efficient training strategy
Calculate the loss value through subsampled points to significantly improve training efficiency

Model Capabilities

Instance segmentation
Semantic segmentation
Panoramic segmentation
Image scene understanding

Use Cases

Autonomous driving
Road scene analysis
Identify instances such as vehicles, pedestrians, and traffic signs in urban roads
Achieve SOTA performance on the Cityscapes dataset
Medical imaging
Organ segmentation
Identify specific organs or lesion areas in medical images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase