M

Mask2former Swin Large Coco Instance

Developed by facebook
Mask2Former is a Transformer-based unified image segmentation model, utilizing a Swin-Large backbone and fine-tuned on the COCO dataset, specializing in instance segmentation tasks.
Downloads 37.31k
Release Time : 1/2/2023

Model Overview

This model achieves instance segmentation by predicting a set of masks and corresponding labels, employing multi-scale deformable attention mechanisms to enhance performance. It is an improved version of MaskFormer.

Model Features

Unified Segmentation Framework
Handles instance/semantic/panoptic segmentation tasks with the same architecture, simplifying the workflow.
Multi-scale Deformable Attention
Replaces traditional pixel decoders, significantly improving feature extraction efficiency.
Masked Attention Mechanism
Introduces masked attention in the Transformer decoder, enhancing performance without increasing computational burden.
Efficient Training Strategy
Calculates loss via sampled points rather than entire masks, achieving 3x faster training speed.

Model Capabilities

Image Instance Segmentation
Multi-object Detection and Segmentation
Complex Scene Parsing

Use Cases

Computer Vision
Object Instance Segmentation
Generates precise segmentation masks for each object instance in an image.
Achieves SOTA performance on the COCO dataset.
Scene Understanding
Analyzes object distribution and spatial relationships in complex scenes.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase