S

Swin Large Patch4 Window7 224

Developed by microsoft
Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification and dense recognition tasks.
Downloads 2,079
Release Time : 3/2/2022

Model Overview

This model is a large-scale vision model based on the Swin Transformer architecture, trained on the ImageNet-1k dataset at 224x224 resolution, and can be used for image classification tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches, suitable for processing visual information at different scales.
Local Window Attention
Computes self-attention only within local windows, making computational complexity linear with respect to input image size.
Efficient Architecture
Compared to traditional vision Transformers, it offers higher computational efficiency and is suitable as a general backbone network.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
Image Classification
Classifies input images into one of the 1,000 categories in ImageNet.
Performs excellently on the ImageNet-1k dataset.
Visual Feature Extraction
Serves as a backbone network to extract image features for downstream vision tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase