S

Swin Large Patch4 Window12 384

Developed by microsoft
Swin Transformer is a hierarchical vision Transformer model based on shifted windows, specifically designed for image classification tasks.
Downloads 22.77k
Release Time : 3/2/2022

Model Overview

This model is trained on the ImageNet-1k dataset at 384x384 resolution, utilizing local window self-attention mechanisms to achieve linear computational complexity, making it suitable as a backbone network for image classification and dense recognition tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging deep image patches, enhancing the model's ability to capture features at different scales.
Local Window Self-Attention
Computes self-attention mechanisms only within local windows, making the computational complexity linear with respect to input image size, thereby improving efficiency.
High-Resolution Processing
Supports 384x384 high-resolution image input, suitable for fine-grained image classification tasks.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
ImageNet Image Classification
Classifies images into one of the 1000 ImageNet categories.
High-accuracy classification performance (specific metrics not provided).
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase