S

Swinv2 Large Patch4 Window12to16 192to256 22kto1k Ft

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Downloads 812
Release Time : 6/16/2022

Model Overview

This model was pretrained on ImageNet-21k and fine-tuned on ImageNet-1k at 256x256 resolution, suitable for image classification tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches in deeper layers, improving feature extraction efficiency.
Local Window Self-Attention
Computes self-attention only within local windows, resulting in linear computational complexity relative to input image size.
Residual Post-Normalization
Uses residual post-normalization combined with cosine attention to enhance training stability.
Log-Spaced Continuous Position Bias
Effectively transfers models pretrained on low-resolution images to downstream tasks with high-resolution inputs.
Self-Supervised Pretraining
Employs SimMIM self-supervised pretraining method, reducing reliance on large amounts of labeled images.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Image Recognition
Animal Recognition
Identifies animal species in images, such as tigers.
Object Recognition
Recognizes everyday objects, such as teapots.
Scene Recognition
Identifies complex scenes, such as palaces.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase