SegFormer-b5 Open-source Image Segmentation Model - Free Deployment for Accurate Image Segmentation Tasks

Segformer B5 Finetuned Ade 640 640

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for image segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Semantic Segmentation #Transformer Architecture #ADE20k Dataset

Downloads 42.32k

Release Time : 3/2/2022

Model Overview

This model employs a hierarchical Transformer encoder and lightweight all-MLP decoder design, excelling in semantic segmentation tasks, particularly for scene parsing.

Model Features

Hierarchical Transformer Architecture

Utilizes an innovative hierarchical Transformer design to effectively capture multi-scale features

Lightweight MLP Decoder

Employs an all-MLP structured decoder to maintain high performance while reducing computational complexity

ADE20k Dataset Fine-tuning

Specifically optimized on the scene parsing benchmark dataset ADE20k

Model Capabilities

Image Semantic Segmentation

Scene Parsing

Pixel-level Classification

Use Cases

Computer Vision

Building Scene Parsing

Performs pixel-level semantic segmentation on building images to identify different architectural elements

Indoor Scene Understanding

Analyzes indoor scene images to segment and recognize different objects like furniture and walls

🚀 SegFormer (b5-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k for high - performance image segmentation.

🚀 Quick Start

You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.

✨ Features

Hierarchical Transformer Encoder: The model uses a hierarchical Transformer encoder, which is first pre - trained on ImageNet - 1k.
Lightweight All - MLP Decode Head: A lightweight all - MLP decode head is added and fine - tuned altogether on a downstream dataset, achieving great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b5-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b5-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📚 Documentation

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

📄 License

The license for this model can be found here.

📚 BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b5 - sized) fine - tuned on ADE20k
Training Data	scene_parse_150

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご