Mask2former Swin Small Cityscapes Semantic
Small version of Mask2Former based on Swin backbone network, specifically trained for Cityscapes semantic segmentation tasks
Downloads 952
Release Time : 1/5/2023
Model Overview
Mask2Former is a universal image segmentation model that uses a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks. It achieves segmentation by predicting a set of masks and their corresponding labels.
Model Features
Unified Segmentation Paradigm
Unifies instance segmentation, semantic segmentation, and panoptic segmentation as instance segmentation tasks
Efficient Attention Mechanism
Uses multi-scale deformable attention Transformer to replace traditional pixel decoders
Masked Attention Decoder
Introduces a Transformer decoder with masked attention, improving performance without increasing computational cost
Efficient Training Method
Calculates loss via sampled points rather than entire masks, significantly improving training efficiency
Model Capabilities
Image Semantic Segmentation
Multi-category Object Recognition
High-precision Mask Prediction
Use Cases
Autonomous Driving
Street Scene Semantic Segmentation
Performs pixel-level classification of urban road scenes to identify elements such as roads, vehicles, and pedestrians
Excellent performance on the Cityscapes dataset
Remote Sensing Image Analysis
Land Cover Classification
Performs segmentation of land cover types on satellite or aerial images
Featured Recommended AI Models
Š 2025AIbase