S

Swinv2 Base Patch4 Window12to16 192to256 22kto1k Ft

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window-based self-attention mechanisms.
Downloads 459
Release Time : 6/16/2022

Model Overview

This model is pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k, suitable for image classification tasks. It incorporates improvements such as residual post-normalization, cosine attention, and log-spaced continuous position bias.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches at deeper layers, suitable for image classification and dense recognition tasks.
Local Window Self-Attention
Computes self-attention only within local windows, resulting in computational complexity that scales linearly with input image size.
Training Stability Improvements
Enhances training stability through residual post-normalization and cosine attention.
High-Resolution Transfer Capability
Uses log-spaced continuous position bias to effectively transfer low-resolution pre-trained models to high-resolution input tasks.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
ImageNet Image Classification
Classifies images into one of the 1000 ImageNet categories.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase