S

Swinv2 Large Patch4 Window12to24 192to384 22kto1k Ft

Developed by microsoft
Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 384x384 resolution, featuring hierarchical feature maps and local window self-attention mechanisms.
Downloads 3,048
Release Time : 6/16/2022

Model Overview

This model is primarily used for image classification tasks. By constructing hierarchical feature maps and local window self-attention mechanisms, it effectively reduces computational complexity and is suitable for various visual recognition tasks.

Model Features

Hierarchical Feature Maps
Constructs hierarchical feature maps by merging image patches at deeper layers, suitable for processing images at different resolutions.
Local Window Self-Attention
Computes self-attention only within local windows, making computational complexity linear with input image size, thereby improving efficiency.
Training Stability Improvements
Combines residual post-normalization with cosine attention to enhance training stability.
High-Resolution Transfer Capability
Uses log-spaced continuous position bias to effectively transfer low-resolution pre-trained models to high-resolution input tasks.
Self-Supervised Pre-training
Introduces SimMIM self-supervised pre-training method, reducing the need for large amounts of labeled images.

Model Capabilities

Image Classification
Visual Feature Extraction
High-Resolution Image Processing

Use Cases

General Image Classification
ImageNet Classification
Classifies images into one of the 1000 ImageNet categories.
High-accuracy image classification capability.
Visual Recognition
Object Recognition
Identifies specific objects in images, such as animals, everyday items, etc.
Accurately recognizes various common objects.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase