SegFormer Open-Source Semantic Segmentation Model - Free Deployment for Precise Road Scene Segmentation Tasks

Segformer B0 Finetuned Cityscapes 512 1024

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the Cityscapes dataset, suitable for road scene segmentation tasks.

Image Segmentation

Transformers

Open Source License:Other #Urban road segmentation #Transformer architecture #512x1024 high resolution

Downloads 1,111

Release Time : 3/2/2022

Model Overview

This model adopts a hierarchical Transformer encoder and lightweight all-MLP decoder head architecture, specifically optimized for 512x1024 resolution Cityscapes dataset for semantic segmentation tasks.

Model Features

Hierarchical Transformer architecture

Adopts a hierarchical Transformer encoder to effectively capture multi-scale features

Lightweight MLP decoder head

Uses a lightweight all-MLP decoder head to maintain efficient inference speed

Cityscapes optimization

Specifically fine-tuned and optimized for the Cityscapes road scene dataset

Model Capabilities

Road scene semantic segmentation

High-resolution image processing (512x1024)

Multi-class pixel-level classification

Use Cases

Intelligent transportation

Autonomous driving scene understanding

Identifies traffic elements such as roads, pedestrians, and vehicles

Example images demonstrate accurate segmentation of road scenes

Urban digitization

Street view image analysis

Performs semantic segmentation of urban street scenes to support city planning

🚀 SegFormer (b4-sized) model fine-tuned on CityScapes

A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 512x1024, offering high - performance semantic segmentation.

🚀 Quick Start

The SegFormer model presented here is fine - tuned on the CityScapes dataset at a resolution of 512x1024. It was first introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and initially released in this repository.

Disclaimer: The team that released SegFormer did not write a model card for this model. This model card has been written by the Hugging Face team.

✨ Features

Hierarchical Architecture: SegFormer is composed of a hierarchical Transformer encoder and a lightweight all - MLP decode head, which enables it to achieve excellent results on semantic segmentation benchmarks like ADE20K and Cityscapes.
Pre - training and Fine - tuning: The hierarchical Transformer is pre - trained on ImageNet - 1k, and then a decode head is added and fine - tuned on a downstream dataset.

📚 Documentation

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b0-finetuned-cityscapes-512-1024")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-cityscapes-512-1024")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b4 - sized) fine - tuned on CityScapes
Training Data	Cityscapes
Tags	vision, image - segmentation

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご