SegFormer Open-source Image Segmentation Model - Free Deployment for High-precision Image Segmentation Tasks

Segformer B1 Finetuned Ade 512 512

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20K dataset, suitable for image segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #512x512 semantic segmentation #ADE20k scene parsing #Transformer architecture

Downloads 560.79k

Release Time : 3/2/2022

Model Overview

This model adopts a hierarchical Transformer encoder and lightweight all-MLP decoder head architecture, specifically designed for semantic segmentation tasks, optimized for the ADE20k dataset at 512x512 resolution.

Model Features

Hierarchical Transformer encoder

Adopts a hierarchical Transformer architecture that effectively captures image features at different scales.

Lightweight MLP decoder head

Uses an all-MLP decoder head design to maintain high performance while reducing computational complexity.

512x512 resolution optimization

Specifically optimized for 512x512 resolution images, suitable for medium-resolution segmentation tasks.

Model Capabilities

Image semantic segmentation

Scene understanding

Object boundary recognition

Use Cases

Scene parsing

House scene segmentation

Performs semantic segmentation on house images to identify architectural elements like walls, doors, and windows.

Castle scene parsing

Analyzes castle images to segment different architectural structures and landscape elements.

Urban landscape analysis

Urban street scene segmentation

Identifies and segments elements in urban street scenes such as roads, vehicles, and pedestrians.

🚀 SegFormer (b1-sized) model fine-tuned on ADE20k

A SegFormer model fine-tuned on ADE20k at 512x512 resolution, offering effective semantic segmentation solutions.

🚀 Quick Start

The SegFormer model is fine-tuned on ADE20k at a resolution of 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📚 Documentation

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

License: other

Information Table

Property	Details
Tags	vision, image - segmentation
Datasets	scene_parse_150

Widget Examples

House
Castle

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご