Mask2former Swin Small Cityscapes Instance
Mask2Former is a unified image segmentation model based on Transformer, using mask attention mechanism to improve performance
Downloads 43
Release Time : 1/5/2023
Model Overview
This model is a small version of Mask2Former, using Swin Transformer as the backbone network, specifically fine-tuned for instance segmentation tasks on the Cityscapes dataset. It employs a unified architecture to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks.
Model Features
Unified Segmentation Architecture
Uses a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks
Mask Attention Mechanism
Introduces a Transformer decoder with masked attention, improving performance without increasing computational cost
Efficient Training Strategy
Calculates loss through sampled points rather than entire masks, significantly improving training efficiency
Model Capabilities
Image Instance Segmentation
Multi-scale Feature Extraction
High-precision Object Boundary Recognition
Use Cases
Autonomous Driving
Street Scene Object Recognition
Identifies instances such as vehicles and pedestrians in urban street scenes
Performs excellently on the Cityscapes dataset
Smart Surveillance
Scene Analysis
Performs precise segmentation and recognition of objects in surveillance footage
Featured Recommended AI Models