Mask2former Swin Base IN21k Cityscapes Semantic
A general-purpose image segmentation model based on Swin Transformer, unifying instance/semantic/panoptic segmentation tasks
Downloads 329
Release Time : 1/16/2023
Model Overview
Mask2Former is an advanced image segmentation model employing Transformer architecture, achieving unified instance segmentation, semantic segmentation, and panoptic segmentation through predicting a set of masks and corresponding labels.
Model Features
Unified Segmentation Architecture
Uses the same model architecture to handle three segmentation tasks (instance/semantic/panoptic)
Masked Attention Mechanism
Innovative masked-attention Transformer decoder improves performance without increasing computational cost
Efficient Training Strategy
Significantly enhances training efficiency by computing loss via sampled points instead of full mask computation
Multi-scale Feature Processing
Employs deformable attention mechanism to effectively capture multi-scale features
Model Capabilities
Semantic Segmentation
Instance Segmentation
Panoptic Segmentation
Multi-scale Image Analysis
Object Recognition and Localization
Use Cases
Autonomous Driving
Street Scene Semantic Segmentation
Identifying key elements like roads, vehicles, and pedestrians
Achieves SOTA performance on Cityscapes dataset
Medical Imaging
Organ Segmentation
Precise segmentation of organ tissues in CT/MRI images
Remote Sensing
Land Cover Classification
Identifying different land types in satellite images
Featured Recommended AI Models
Š 2025AIbase