S

Swinv2 Base Patch4 Window12to24 192to384 22kto1k Ft

Developed by microsoft
Swin Transformer v2 is a vision transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Downloads 1,824
Release Time : 6/16/2022

Model Overview

This model is pretrained on ImageNet-21k and fine-tuned on ImageNet-1k at 384x384 resolution, ready for direct use in image classification tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches at deeper layers, suitable for processing images at different resolutions.
Local Window Self-Attention
Computes self-attention only within local windows, with computational complexity linear to input image size, improving efficiency.
Training Stability Improvements
Enhances training stability by combining residual post-normalization with cosine attention.
High-Resolution Transfer Capability
Uses log-spaced continuous position bias to effectively transfer low-resolution pretrained models to high-resolution downstream tasks.
Self-Supervised Pretraining
Introduces SimMIM self-supervised pretraining method, reducing reliance on large amounts of labeled images.

Model Capabilities

Image Classification
Dense Recognition

Use Cases

Image Recognition
Animal Recognition
Identifies animal categories in images, such as tigers.
Accurately classified into one of the 1000 categories in ImageNet-1k.
Object Recognition
Identifies everyday objects, such as teapots.
Accurately classified into one of the 1000 categories in ImageNet-1k.
Scene Recognition
Identifies architectural or natural scenes, such as palaces.
Accurately classified into one of the 1000 categories in ImageNet-1k.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase