S

Swinv2 Small Patch4 Window16 256

Developed by microsoft
Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.
Downloads 315
Release Time : 6/15/2022

Model Overview

This model is pre-trained on the ImageNet-1k dataset at 256x256 resolution, suitable for image classification tasks. It incorporates improvements such as residual post-normalization, cosine attention, and log-spaced continuous position bias.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches, adapting to visual tasks at different scales.
Local Window Self-Attention
Computes self-attention only within local windows, resulting in linear computational complexity relative to input image size.
Training Stability Improvements
Combines residual post-normalization and cosine attention to enhance training stability.
High-Resolution Transfer
Employs log-spaced continuous position bias to effectively support high-resolution inputs.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
Object Recognition
Identifies object categories in images, such as animals, everyday items, etc.
Can classify 1000 ImageNet categories
Scene Classification
Classifies image scenes, such as identifying buildings, natural landscapes, etc.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase