M

Mambavision L 21K

Developed by nvidia
MambaVision is a hybrid Mamba-Transformer visual backbone network designed for vision applications, combining the strengths of the Mamba formula and vision Transformers, delivering outstanding performance in image classification and downstream vision tasks.
Downloads 571
Release Time : 3/24/2025

Model Overview

MambaVision is a novel hybrid Mamba-Transformer backbone network that enhances visual feature modeling by redesigning the Mamba formula and incorporates a self-attention block in the final layer to capture long-range spatial dependencies. This model achieves SOTA performance on the ImageNet-1K classification task and excels in downstream tasks such as object detection, instance segmentation, and semantic segmentation.

Model Features

Hybrid Architecture Design
Combines the strengths of the Mamba formula and vision Transformers, redesigning the Mamba formula to enhance visual feature modeling.
Hierarchical Structure
Adopts a hierarchical architecture design to meet various design criteria, incorporating a self-attention block in the final layer to capture long-range spatial dependencies.
High Performance
Achieves 86.1% Top-1 accuracy on the ImageNet-1K classification task and excels in downstream vision tasks.
Efficient Inference
Achieves SOTA Pareto frontier in accuracy and throughput, balancing performance and efficiency.

Model Capabilities

Image Classification
Feature Extraction
Object Detection
Instance Segmentation
Semantic Segmentation

Use Cases

Computer Vision
Image Classification
Classifies input images to identify the main object categories.
Achieves 86.1% Top-1 accuracy on ImageNet-1K.
Feature Extraction
Extracts multi-level image features for downstream vision tasks.
Can extract features from 4 stages and the final average pooling layer.
Object Detection
Serves as a backbone network for object detection tasks.
Outperforms backbone networks of similar scale on the MS COCO dataset.
Semantic Segmentation
Serves as a backbone network for semantic segmentation tasks.
Outperforms backbone networks of similar scale on the ADE20K dataset.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase