S

Swin Base Patch4 Window12 384

Developed by microsoft
Swin Transformer is a hierarchical vision transformer based on shifted windows, specifically designed for image classification tasks, with computational complexity linear to input image size.
Downloads 1,421
Release Time : 3/2/2022

Model Overview

This model was trained on the ImageNet-1k dataset at 384x384 resolution and can serve as a general backbone for image classification and dense recognition tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches in deeper layers, enhancing the model's ability to capture features at different scales.
Local Window Self-Attention
Computes self-attention only within local windows, making computational complexity linear to input image size and improving efficiency.
Shifted Window Mechanism
Employs a shifted window design to allow cross-window information interaction while maintaining computational efficiency.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
ImageNet Image Classification
Classifies input images into one of 1000 ImageNet categories.
Dense Recognition Tasks
Serves as a backbone for dense recognition tasks like object detection and semantic segmentation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase