S

Swinv2 Large Patch4 Window12 192 22k

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Downloads 3,816
Release Time : 6/15/2022

Model Overview

This model was pre-trained on the ImageNet-21k dataset at 192x192 resolution, utilizing improved residual post-normalization and cosine attention mechanisms, making it suitable for image classification tasks.

Model Features

Hierarchical feature maps
Constructs hierarchical feature maps by merging image patches at deeper layers, improving feature extraction efficiency.
Local window self-attention
Computes self-attention only within local windows, making the computational complexity linear with respect to input image size.
Training stability improvements
Combines residual post-normalization and cosine attention mechanisms to enhance training stability.
High-resolution transfer
Uses a log-spaced continuous position bias method to effectively transfer low-resolution pre-trained models to high-resolution tasks.

Model Capabilities

Image classification
Visual feature extraction

Use Cases

Image recognition
Animal recognition
Identifies animal species in images, such as tigers.
Object recognition
Identifies everyday objects, such as teapots.
Scene recognition
Identifies architectural or natural scenes, such as palaces.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase