S

Swin Small Patch4 Window7 224

Developed by microsoft
Swin Transformer is a hierarchical window-based vision Transformer model designed for image classification tasks, with computational complexity linearly related to input image size.
Downloads 2,028
Release Time : 3/2/2022

Model Overview

This model was trained on the ImageNet-1k dataset at 224x224 resolution and can serve as a general backbone network for image classification and dense recognition tasks.

Model Features

Hierarchical Window Attention Mechanism
Computes self-attention within local windows, significantly reducing computational complexity to achieve linear relationship with input image size.
Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches, suitable for processing visual information at different scales.
Efficient Computation
More computationally efficient compared to traditional vision Transformers that compute global self-attention.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
ImageNet Image Classification
Classifies input images into one of 1000 ImageNet categories
Trained on the ImageNet-1k dataset
Dense Recognition Tasks
Serves as backbone network for object detection, semantic segmentation and other tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase