M

Mambavision L2 512 21K

Developed by nvidia
The first hybrid computer vision model combining the advantages of Mamba and Transformer, enhancing visual feature modeling capability by reconstructing the Mamba formula
Downloads 2,678
Release Time : 3/24/2025

Model Overview

MambaVision is a hybrid computer vision model that combines the strengths of Mamba and Transformer architectures, specifically optimized for visual feature modeling. The model is pre-trained on ImageNet-21K and fine-tuned on ImageNet-1K at 512×512 resolution, achieving excellent image classification performance.

Model Features

Hybrid Architecture Innovation
Successfully combines the advantages of Mamba and Transformer architectures for the first time, reconstructing the Mamba formula to enhance visual feature modeling capability
Hierarchical Architecture Design
Adopts a hierarchical architecture design, incorporating self-attention modules in the final layers of the Mamba architecture, significantly improving long-range spatial dependency modeling
High-Performance Results
Achieves a new SOTA Pareto frontier in Top-1 accuracy and throughput, reaching 87.3% Top-1 accuracy

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
General Image Classification
Classifies input images to identify the main objects or scenes within them
Achieves 87.3% Top-1 accuracy on ImageNet-1K
Visual Feature Extraction
Serves as a general feature extractor, obtaining feature maps from four stages and final average pooling features
Supports acquiring feature representations at different levels, suitable for downstream vision tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase