Open-source mit-b2/SegFormer Semantic Segmentation Model - Free Deployment for Precise Image Scene Recognition

Home

Mit B2

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model whose encoder has been fine-tuned on Imagenet-1k.

Image Segmentation

Transformers

Open Source License:Other #Semantic Segmentation Pretraining #Transformer Encoder #ADE20K Adaptation

Downloads 13.86k

Release Time : 3/2/2022

Model Overview

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decoder head, focusing on semantic segmentation tasks. This version includes only the pretrained hierarchical Transformer for fine-tuning purposes.

Model Features

Hierarchical Transformer Architecture

Adopts a hierarchically designed Transformer encoder capable of effectively processing visual features at different scales.

Lightweight MLP Decoder Head

Paired with a lightweight all-MLP decoder head to achieve excellent semantic segmentation performance while maintaining efficiency.

ImageNet Pretraining

The encoder is pretrained on the ImageNet-1k dataset, providing a solid foundation for feature extraction.

Model Capabilities

Image Semantic Segmentation

Visual Feature Extraction

Downstream Task Fine-tuning

Use Cases

Computer Vision

Scene Understanding

Semantic segmentation on scene datasets like ADE20K

Demonstrates excellent performance on benchmarks such as ADE20K and Cityscapes

Image Analysis

Extracting object and region information from images

🚀 SegFormer (b2-sized) encoder pre-trained-only

SegFormer encoder pre-trained on Imagenet-1k, offering a simple and efficient solution for semantic segmentation.

🚀 Quick Start

This SegFormer encoder is pre-trained on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes.

📚 Documentation

Intended uses & limitations

You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/mit-b2")
model = SegformerForImageClassification.from_pretrained("nvidia/mit-b2")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more code examples, we refer to the documentation.

Information Table

Property	Details
Model Type	SegFormer (b2-sized) encoder pre-trained-only
Training Data	ImageNet-1k

License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご