M

Mask2former Swin Base IN21k Cityscapes Semantic

Developed by facebook
A general-purpose image segmentation model based on Swin Transformer, unifying instance/semantic/panoptic segmentation tasks
Downloads 329
Release Time : 1/16/2023

Model Overview

Mask2Former is an advanced image segmentation model employing Transformer architecture, achieving unified instance segmentation, semantic segmentation, and panoptic segmentation through predicting a set of masks and corresponding labels.

Model Features

Unified Segmentation Architecture
Uses the same model architecture to handle three segmentation tasks (instance/semantic/panoptic)
Masked Attention Mechanism
Innovative masked-attention Transformer decoder improves performance without increasing computational cost
Efficient Training Strategy
Significantly enhances training efficiency by computing loss via sampled points instead of full mask computation
Multi-scale Feature Processing
Employs deformable attention mechanism to effectively capture multi-scale features

Model Capabilities

Semantic Segmentation
Instance Segmentation
Panoptic Segmentation
Multi-scale Image Analysis
Object Recognition and Localization

Use Cases

Autonomous Driving
Street Scene Semantic Segmentation
Identifying key elements like roads, vehicles, and pedestrians
Achieves SOTA performance on Cityscapes dataset
Medical Imaging
Organ Segmentation
Precise segmentation of organ tissues in CT/MRI images
Remote Sensing
Land Cover Classification
Identifying different land types in satellite images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase