V

Video Mask2former Swin Tiny Youtubevis 2021 Instance

Developed by shivalikasingh
A tiny video instance segmentation model trained on the YouTubeVIS-2021 dataset, utilizing a Swin Transformer backbone network
Downloads 22
Release Time : 3/15/2023

Model Overview

Video Mask2Former is an extended version of Mask2Former, specifically designed for video instance segmentation tasks. It employs a unified architecture to handle segmentation tasks, achieving high-performance segmentation by predicting masks and their corresponding labels

Model Features

Unified Segmentation Architecture
Adopts a unified paradigm to handle instance segmentation, semantic segmentation, and panoptic segmentation tasks, treating all tasks as instance segmentation
Improved Attention Mechanism
Replaces the pixel decoder with a multi-scale deformable attention Transformer and employs a Transformer decoder with masked attention to enhance performance
Efficient Training Method
Significantly improves training efficiency by calculating losses based on sampled points rather than entire masks
Video Processing Capability
Directly applies to video instance segmentation tasks without modifying the architecture and achieves state-of-the-art performance

Model Capabilities

Video Instance Segmentation
Object Mask Prediction
Multi-frame Video Analysis

Use Cases

Video Analysis
Video Object Tracking and Segmentation
Performs instance segmentation and tracking of objects in videos
Generates frame-by-frame object segmentation masks
Autonomous Driving Scene Understanding
Analyzes road scene videos to identify and segment various traffic participants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase