S

Swin Large Patch4 Window12 384 In22k

Developed by microsoft
Swin Transformer is a hierarchical window-based vision Transformer model, pretrained on the ImageNet-21k dataset, suitable for image classification tasks.
Downloads 1,063
Release Time : 3/2/2022

Model Overview

This model constructs hierarchical feature maps by computing self-attention within local windows, with computational complexity linear to the input image size, making it suitable as a backbone for image classification and dense recognition tasks.

Model Features

Hierarchical Window Attention Mechanism
Computes self-attention within local windows, significantly reducing computational complexity and achieving linear complexity relative to image size.
Hierarchical Feature Map Construction
Merges image patches at deeper layers to build multi-resolution feature maps, outperforming traditional vision Transformers with single low-resolution feature maps.
High-Resolution Support
Supports 384x384 high-resolution input, pretrained on the large-scale ImageNet-21k dataset.

Model Capabilities

Image Classification
Visual Feature Extraction
Large-scale Image Recognition

Use Cases

Computer Vision
General Image Classification
Classifies images into one of 21,841 ImageNet-21k categories
Visual Backbone Network
Can serve as a feature extractor for downstream vision tasks (e.g., object detection, segmentation)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase