SegFormer-b1 Open-source Model - Fine-tuned on the CityScapes Dataset, Suitable for 1024x1024 Resolution Scenarios

Segformer B1 Finetuned Cityscapes 1024 1024

Developed by nvidia

This SegFormer model is fine-tuned on the CityScapes dataset at 1024x1024 resolution, featuring a hierarchical Transformer encoder and lightweight all-MLP decoder head architecture.

Image Segmentation

Transformers

Open Source License:Other #Urban Scene Segmentation #Transformer Architecture #1024x1024 High Resolution

Downloads 20.27k

Release Time : 3/2/2022

Model Overview

SegFormer is a Transformer-based semantic segmentation model designed for simplicity and efficiency, suitable for tasks like urban scene segmentation.

Model Features

Efficient Design

Utilizes a hierarchical Transformer encoder and lightweight all-MLP decoder head architecture, achieving excellent semantic segmentation performance while maintaining efficiency.

High-Resolution Support

Supports 1024x1024 high-resolution input, ideal for processing complex scenes like urban landscapes.

Pre-training + Fine-tuning

Pre-trained on ImageNet-1k and then jointly fine-tuned on downstream datasets to enhance model adaptability.

Model Capabilities

Image Semantic Segmentation

Urban Scene Analysis

Road Recognition

Use Cases

Intelligent Transportation

Road Segmentation

Identify and segment urban road areas

Sample images demonstrate effective segmentation of road areas by the model

Urban Planning

Urban Scene Analysis

Perform semantic segmentation on urban scenes to identify different regions and objects

🚀 SegFormer (b1-sized) model fine-tuned on CityScapes

A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 1024x1024, offering high - performance semantic segmentation.

🚀 Quick Start

This SegFormer model is fine - tuned on the CityScapes dataset at a resolution of 1024x1024. It was first introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and initially released in this repository.

Disclaimer: The team that released SegFormer did not write a model card for this model, so this model card has been created by the Hugging Face team.

✨ Features

SegFormer combines a hierarchical Transformer encoder and a lightweight all - MLP decode head. This design enables it to achieve excellent results on semantic segmentation benchmarks like ADE20K and Cityscapes. The hierarchical Transformer is pre - trained on ImageNet - 1k first, and then a decode head is added and fine - tuned on a downstream dataset.

📚 Documentation

Intended uses & limitations

You can utilize the raw model for semantic segmentation. Check out the model hub to find fine - tuned versions for tasks that interest you.

How to use

Here's how to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes using this model:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-cityscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-cityscapes-1024-1024")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, refer to the documentation.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	SegFormer (b1 - sized) model fine - tuned on CityScapes
Training Data	Cityscapes

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご