SegFormer Open-Source Image Segmentation Model - Free Deployment, Precise Processing of 1024x1024 Resolution Images

Segformer B2 Finetuned Cityscapes 1024 1024

Developed by nvidia

SegFormer is a semantic segmentation model based on Transformer architecture, fine-tuned on the CityScapes dataset, suitable for image segmentation tasks at 1024x1024 resolution.

Image Segmentation

Transformers

Open Source License:Other #Urban Scene Segmentation #Transformer Architecture #1024 High Resolution

Downloads 2,179

Release Time : 3/2/2022

Model Overview

This model employs a hierarchical Transformer encoder with a lightweight all-MLP decoder head, specifically designed for semantic segmentation tasks, excelling in urban scenes and similar scenarios.

Model Features

Efficient Transformer Architecture

Uses a hierarchical Transformer encoder to achieve excellent semantic segmentation performance while maintaining efficiency.

Lightweight MLP Decoder Head

Employs an all-MLP structured decoder head, making it more lightweight and efficient compared to traditional decoders.

High-Resolution Support

Optimized specifically for 1024x1024 resolution images, suitable for high-precision segmentation tasks.

Model Capabilities

Image Semantic Segmentation

Urban Scene Recognition

Road Scene Parsing

Use Cases

Intelligent Transportation

Road Scene Segmentation

Performs pixel-level semantic segmentation of urban road scenes, identifying elements such as roads, vehicles, and pedestrians.

Performs excellently on the Cityscapes dataset.

Urban Planning

Urban Landscape Analysis

Analyzes urban landscape composition, identifying regions such as buildings, green spaces, and roads.

🚀 SegFormer (b2-sized) model fine-tuned on CityScapes

A SegFormer model fine-tuned on the CityScapes dataset at a resolution of 1024x1024, offering high - performance semantic segmentation.

🚀 Quick Start

This SegFormer model is fine - tuned on the CityScapes dataset at a resolution of 1024x1024. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

SegFormer consists of a hierarchical Transformer encoder and a lightweight all - MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre - trained on ImageNet - 1k, after which a decode head is added and fine - tuned altogether on a downstream dataset.

📚 Documentation

Intended uses & limitations

You can use the raw model for semantic segmentation. See the model hub to look for fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image
import requests

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b2-finetuned-cityscapes-1024-1024")
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b2-finetuned-cityscapes-1024-1024")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)

For more code examples, we refer to the documentation.

📄 License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Property	Details
Model Type	Vision, Image - Segmentation
Training Data	Cityscapes

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご