SegFormer-b0 Open-source Semantic Segmentation Model - Free Deployment for Precise Segmentation of Urban Landscape Images

Segformer B0 Finetuned Cityscapes 768 768

Developed by nvidia

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the CityScapes dataset, suitable for semantic segmentation tasks in urban scene images.

Image Segmentation

Transformers

Open Source License:Other #Urban Scene Segmentation #Transformer Architecture #Lightweight MLP Decoder Head

Downloads 566

Release Time : 3/2/2022

Model Overview

This model employs a hierarchical Transformer encoder and a lightweight all-MLP decoder head design, optimized for semantic segmentation of 768x768 resolution urban scene images, demonstrating excellent performance on benchmarks like CityScapes.

Model Features

Hierarchical Transformer Architecture

Utilizes a hierarchical Transformer encoder to effectively capture multi-scale feature information.

Lightweight MLP Decoder Head

Features an all-MLP decoder head design, maintaining high performance while reducing computational complexity.

High-Resolution Support

Specifically optimized for 768x768 high-resolution images, ideal for urban scene analysis.

Model Capabilities

Image Semantic Segmentation

Urban Scene Analysis

Road Scene Understanding

Use Cases

Intelligent Transportation

Road Scene Segmentation

Used for identifying and segmenting elements like roads, vehicles, and pedestrians in autonomous driving systems.

Performs excellently on the CityScapes dataset

Urban Planning

Urban Scene Analysis

Used to analyze the distribution of urban elements such as buildings, roads, and green spaces.

🚀 SegFormer (b0-sized) model fine-tuned on CityScapes

A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 768x768, offering high - performance semantic segmentation.

🚀 Quick Start

This SegFormer model is fine - tuned on the CityScapes dataset at a resolution of 768x768. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

📚 Documentation

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b0-finetuned-cityscapes-768-768")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-cityscapes-768-768")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b0 - sized) fine - tuned on CityScapes
Training Data	Cityscapes
Tags	vision, image - segmentation
Example Image	Road

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご