SegFormer Open-Source Model - Fine-tuned on ADE20k for Efficient 512x512 Image Segmentation

Segformer B3 Finetuned Ade 512 512

Developed by nvidia

This SegFormer model was fine-tuned on the ADE20k dataset at 512x512 resolution, featuring a hierarchical Transformer encoder and lightweight all-MLP decoder head architecture.

Image Segmentation

Transformers

Open Source License:Other #Transformer semantic segmentation #ADE20k scene parsing #512x512 high resolution

Downloads 13.13k

Release Time : 3/2/2022

Model Overview

SegFormer is a Transformer-based semantic segmentation model suitable for image segmentation tasks, demonstrating excellent performance on benchmarks like ADE20K and Cityscapes.

Model Features

Efficient design

Utilizes hierarchical Transformer encoder and lightweight all-MLP decoder head architecture for concise and efficient semantic segmentation.

High performance

Demonstrates excellent performance on semantic segmentation benchmarks like ADE20K and Cityscapes.

Pre-training + Fine-tuning

The hierarchical Transformer is first pre-trained on ImageNet-1k, then the decoder head is added and jointly fine-tuned on downstream datasets.

Model Capabilities

Image semantic segmentation

Scene parsing

Use Cases

Computer vision

House scene parsing

Performs semantic segmentation on house images to identify different objects and regions.

Castle scene parsing

Performs semantic segmentation on castle images to identify different architectural structures and environmental elements.

🚀 SegFormer (b3-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k at 512x512 resolution, offering high - performance semantic segmentation.

🚀 Quick Start

This SegFormer model is fine - tuned on ADE20k at a resolution of 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head. This combination enables it to achieve excellent results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k. Then, a decode head is added and fine - tuned on a downstream dataset.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b3-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b3-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📚 Documentation

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

📄 License

The license for this model can be found here.

📚 BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b3 - sized) fine - tuned on ADE20k
Training Data	scene_parse_150
Widget Examples	House, Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご