mit-b1 SegFormer Open-Source Semantic Segmentation Model - Free to Implement Precise and Fast Image Segmentation Applications

Mit B1

Developed by nvidia

SegFormer is a semantic segmentation model based on Transformer architecture, featuring a hierarchical encoder and lightweight MLP decoder design.

Image Segmentation

Transformers

Open Source License:Other #Semantic Segmentation #Transformer Encoder #Lightweight MLP Decoder

Downloads 7,305

Release Time : 3/2/2022

Model Overview

This model is the pretrained encoder part of SegFormer, fine-tuned on ImageNet-1k, suitable for transfer learning in semantic segmentation tasks.

Model Features

Hierarchical Transformer Architecture

Adopts a multi-scale feature extraction hierarchical design, effectively capturing visual features at different levels

Lightweight MLP Decoder

More computationally efficient and with fewer parameters compared to traditional convolutional decoders

ImageNet Pretraining

Encoder pretrained on ImageNet-1k, with strong feature extraction capabilities

Model Capabilities

Image Semantic Segmentation

Visual Feature Extraction

Transfer Learning

Use Cases

Computer Vision

Scene Understanding

Pixel-level semantic segmentation of indoor and outdoor scenes

Excellent performance on benchmarks like ADE20K and Cityscapes

Autonomous Driving

Road scene parsing and object recognition

🚀 SegFormer (b1-sized) encoder pre-trained-only

SegFormer encoder pre-trained on Imagenet-1k, offering a simple and efficient solution for semantic segmentation.

🚀 Quick Start

SegFormer encoder is fine-tuned on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes.

📚 Documentation

Intended uses & limitations

You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

from transformers import SegformerFeatureExtractor, SegformerForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/mit-b1")
model = SegformerForImageClassification.from_pretrained("nvidia/mit-b1")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more code examples, we refer to the documentation.

📄 License

The license for this model can be found here.

📚 Technical Details

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b1-sized) encoder pre-trained-only
Training Data	ImageNet-1k

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご