S

Swinv2 Base Patch4 Window8 256

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Downloads 16.61k
Release Time : 6/15/2022

Model Overview

This model was pretrained on the ImageNet-1k dataset at 256x256 resolution, employing improved training stability and high-resolution transfer techniques, making it suitable for image classification tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches, suitable for processing images at different resolutions.
Local Window Self-Attention
Computes self-attention only within local windows, resulting in linear computational complexity relative to input image size, thereby improving efficiency.
Improved Training Stability
Incorporates residual post-normalization and cosine attention to enhance training stability.
High-Resolution Transfer
Uses log-spaced continuous position bias to effectively transfer low-resolution pretrained models to downstream tasks with high-resolution inputs.

Model Capabilities

Image Classification
Dense Recognition Tasks

Use Cases

Computer Vision
ImageNet Image Classification
Classifies images into one of the 1,000 categories in ImageNet.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase