# High-Resolution Image Classification
Mambavision L3 512 21K
Other
MambaVision is the first hybrid computer vision model combining the strengths of Mamba and Transformer. It enhances visual feature modeling by redesigning the Mamba formulation and incorporates self-attention modules in the final layers of the Mamba architecture to improve long-range spatial dependency modeling.
Image Classification
Transformers

M
nvidia
7,548
49
Mambavision L2 512 21K
Other
The first hybrid computer vision model combining the advantages of Mamba and Transformer, enhancing visual feature modeling capability by reconstructing the Mamba formula
Image Classification
Transformers

M
nvidia
2,678
3
Efficientnet B7
Apache-2.0
EfficientNet is an efficient convolutional neural network that achieves high-performance image classification by uniformly scaling depth, width, and resolution
Image Classification
Transformers

E
google
6,522
17
Swinv2 Large Patch4 Window12to24 192to384 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 384x384 resolution, featuring hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
3,048
4
Swinv2 Large Patch4 Window12to16 192to256 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
812
4
Swinv2 Base Patch4 Window12to16 192to256 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window-based self-attention mechanisms.
Image Classification
Transformers

S
microsoft
459
1
Swinv2 Base Patch4 Window16 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
1,853
3
Swinv2 Tiny Patch4 Window8 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-1k, featuring hierarchical feature maps and local window self-attention mechanisms with linear computational complexity.
Image Classification
Transformers

S
microsoft
25.04k
10
Cvt 21 384 22k
Apache-2.0
CvT-21 is a vision model combining convolutional and Transformer architectures, pretrained on ImageNet-22k and fine-tuned on ImageNet-1k
Image Classification
Transformers

C
microsoft
134
3
Cvt 13 384
Apache-2.0
CvT-13 is a vision transformer model pre-trained on the ImageNet-1k dataset, improving the performance of traditional vision transformers by introducing convolutional operations.
Image Classification
Transformers

C
microsoft
27
0
Swin Base Patch4 Window12 384 In22k
Apache-2.0
Swin Transformer is a hierarchical vision Transformer based on shifted windows, specifically designed for image classification tasks.
Image Classification
Transformers

S
microsoft
2,431
1
Featured Recommended AI Models