S

Swinv2 Base Patch4 Window12 192 22k

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.
Downloads 8,603
Release Time : 6/15/2022

Model Overview

This model is pretrained on the ImageNet-21k dataset at 192x192 resolution, suitable for image classification tasks. It incorporates improvements such as post-normalization residuals, cosine attention, and log-spaced continuous position bias.

Model Features

Hierarchical Feature Map Construction
Builds hierarchical feature maps by deeply merging image patches, improving feature extraction efficiency.
Local Window Self-Attention
Computes self-attention only within local windows, making computational complexity linear with input image size.
Training Stability Improvements
Uses post-normalization residuals and cosine attention mechanisms to enhance training stability.
High-Resolution Migration Capability
Employs log-spaced continuous position bias to effectively support migration from low-resolution to high-resolution inputs.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
ImageNet Image Classification
Classifies input images into one of the 21k ImageNet categories.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase