SegFormer Open-Source Image Segmentation Model - Free Deployment, Supports Image Segmentation at 1024x1024 Resolution

Segformer B3 Finetuned Cityscapes 1024 1024

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the Cityscapes dataset, suitable for image segmentation tasks at 1024x1024 resolution.

Image Segmentation

Transformers

Open Source License:Other #Urban Scene Segmentation #Transformer Architecture #1024 Resolution

Downloads 2,678

Release Time : 3/2/2022

Model Overview

This model employs a hierarchical Transformer encoder and a lightweight all-MLP decoder head architecture, specifically designed for semantic segmentation tasks, excelling in urban scenes and similar scenarios.

Model Features

Hierarchical Transformer Encoder

Utilizes a hierarchical Transformer architecture to effectively capture multi-scale features.

Lightweight MLP Decoder Head

Features an all-MLP decoder head design, maintaining efficiency while delivering precise segmentation results.

High-Resolution Support

Supports 1024x1024 high-resolution image input, ideal for detailed segmentation tasks.

Model Capabilities

Image Semantic Segmentation

Urban Scene Analysis

Road Recognition

Use Cases

Intelligent Transportation

Road Segmentation

Identify and segment urban road areas

Example images demonstrate the model's precise segmentation of roads.

Urban Management

Urban Scene Analysis

Identify and segment various elements in urban environments

🚀 SegFormer (b3-sized) model fine-tuned on CityScapes

A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 1024x1024, offering high - performance semantic segmentation.

🚀 Quick Start

SegFormer is a model fine - tuned on CityScapes at a resolution of 1024x1024. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Powerful Architecture: SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head, achieving great results on semantic segmentation benchmarks such as ADE20K and Cityscapes.
Pre - training and Fine - tuning: The hierarchical Transformer is first pre - trained on ImageNet - 1k, and then a decode head is added and fine - tuned altogether on a downstream dataset.

📚 Documentation

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b3-finetuned-cityscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b3-finetuned-cityscapes-1024-1024")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

License: other

Property	Details
Model Type	Vision, Image - Segmentation
Training Data	Cityscapes

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご