M

Mit B2

Developed by nvidia
SegFormer is a Transformer-based semantic segmentation model whose encoder has been fine-tuned on Imagenet-1k.
Downloads 13.86k
Release Time : 3/2/2022

Model Overview

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decoder head, focusing on semantic segmentation tasks. This version includes only the pretrained hierarchical Transformer for fine-tuning purposes.

Model Features

Hierarchical Transformer Architecture
Adopts a hierarchically designed Transformer encoder capable of effectively processing visual features at different scales.
Lightweight MLP Decoder Head
Paired with a lightweight all-MLP decoder head to achieve excellent semantic segmentation performance while maintaining efficiency.
ImageNet Pretraining
The encoder is pretrained on the ImageNet-1k dataset, providing a solid foundation for feature extraction.

Model Capabilities

Image Semantic Segmentation
Visual Feature Extraction
Downstream Task Fine-tuning

Use Cases

Computer Vision
Scene Understanding
Semantic segmentation on scene datasets like ADE20K
Demonstrates excellent performance on benchmarks such as ADE20K and Cityscapes
Image Analysis
Extracting object and region information from images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase