SegFormer Open-Source Semantic Segmentation Model - Free Deployment to Support 512x512 Image Segmentation Tasks

Segformer B0 Finetuned Ade 512 512

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for 512x512 resolution image segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #512x512 image segmentation #ADE20k specialized #Transformer architecture

Downloads 179.04k

Release Time : 3/2/2022

Model Overview

This model adopts a hierarchical Transformer encoder with a lightweight all-MLP decoder head architecture, specifically designed for semantic segmentation tasks, demonstrating excellent performance on benchmarks like ADE20K.

Model Features

Hierarchical Transformer encoder

Adopts a hierarchical Transformer architecture to effectively capture multi-scale features

Lightweight MLP decoder head

Uses an all-MLP designed lightweight decoder head to improve inference efficiency

512x512 resolution support

Specifically optimized for 512x512 resolution images

Model Capabilities

Image semantic segmentation

Scene parsing

Pixel-level classification

Use Cases

Scene understanding

House scene parsing

Performs semantic segmentation on house images to identify different architectural elements

Castle scene parsing

Performs semantic segmentation on castle images to identify different architectural features

Urban planning

Urban landscape analysis

Analyzes urban landscape images to identify elements like roads, buildings, green spaces, etc.

🚀 SegFormer (b0-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k at 512x512 resolution, offering high - performance semantic segmentation.

🚀 Quick Start

This SegFormer model is fine - tuned on ADE20k at a resolution of 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Hierarchical Design: SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head, achieving great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.
Pre - training and Fine - tuning: The hierarchical Transformer is first pre - trained on ImageNet - 1k. Then, a decode head is added and fine - tuned altogether on a downstream dataset.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import requests

processor = SegformerImageProcessor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

Advanced Usage

For more code examples, we refer to the documentation.

📚 Documentation

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

📄 License

The license for this model can be found here.

🔧 Technical Details

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b0 - sized) fine - tuned on ADE20k
Training Data	scene_parse_150

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご